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THE SUPERVISORY INVENTORY: 
A FORCED-CHOICE MEASURE OF HUMAN RELATIONS 
ATTITUDE AND TECHNIQUE 


SOLOMON L. SCHWARTZ anp NORMAN GEKOSKI 


Temple University 


The purpose of this research was to de- 
velop a new measurement of human relations 
attitude and skill. Feeling that present meas- 
ures (a) were too susceptible to deliberate 
manipulation by the respondent, (4) con- 
tained items validities sta- 
tistically determined prior to their inclusion, 
or (c) contained items insufficiently pertinent 
the au- 
thors set out to develop an instrument to les- 


whose were not 


to everyday supervisory situations 
sen some of these objections. 


METHOD 


The use of the forced-choice technique 
and other methods served the accomplishment 
of these objectives (Berkshire & Highland, 


1953). 


Population 


The overall populaticn participating in the con 
struction phase of this inventory consisted of 131 
white, male supervisors employed in a large steel 
plant in the Philadelphia area. Approx 
mately 80% of the total number of supervisors in 
the plant participated in the study. Each department 
of the plant was represented, the majority of super 
production 


Greater 


visors coming from departments. The 
 # the 
bracket 
percentage of the group was made up 
of front-line supervisors. Included in the population 
were individuals designated as (a) foremen, (6) su 


pervisors, (c) 


age range of this population was from 28 to 5 
bulk of the cases falling in the 
A very 


35 to 45 age 


large 


leaders, and (d) 
The average length of time in 


group department 
a supervisory 
position for this group was eight and one-half years 

The population used in the validation phase of this 
study consisted of 73 supervisors employed in an- 
other industrial plant in the Philadelphia area. In 
the main, the characteristics of this group were simi 


heads 


lar to those of the original group 


Item Collection 


The usefulness of the items to be included in the 
inventory was evaluated by the 
criteria 

1. They should be easily understood by 
visors to yield maximum readability 

They, collectively, should cover a broad range 
of supervisory attitudes and behavior rather than be 
estricted to a narrow area 

3. They should be pertinent to all supervisors and 
pertain to the 
pervisory 


following major 


all super 


aspects of industrial life and su 
responsibilities which confront supervisors 
every day 

The sentence completion technique was used to ac 
cumulate items (Gekoski & Isard, 1955) 


sentence completion stems 120 


From 300 
were selected which 
would best elicit a variety of useful responses. Thre: 
sentence completion forms, each containing 40 stems, 
were constructed. The forms 
they elicited expressions of 


similar in that 
about similar 
stems, how 


were 
attitude 
industrial life. The specific 
different. Each form was 


four persons skilled in supervision 


aspects of 

ever, were reveiwed by 
To increase the likelihood of getting expressions of 

real feelings, the respondents were asked not to 

identify themselves. A total of 70 supervisors par 

ticipated in filling out the sentence completion blanks 
The final pool of statements numbered 1668 


Determination of Attitudinal Area 


Each of the statements was classified by 
the nature of its content. A total of 37 cate- 
gories evolved. 


The categories were combined by the au- 
thors on the basis of similarity of topics. This 
classification yielded four major areas. The 
four. areas were as follows: 

Area I Management: feeling toward top 
management, pay, company policy, benefits, 
plant regulations, company training, and other 
aspects over which the supervisor has little 
control. 
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Area II Supervision: attitude toward the 
duties and responsibilities of a supervisor, his 
annoyances, desires, and needs, the charac- 
teristics which make for an “ideal’’ super- 
visor, and feelings toward other supervisors. 

Area III Employees: attitude toward the 
supervisor’s subordinates. 

Area IV Human Relations Practices: su- 
pervisory techniques for handling problems, 
troublemaking, lateness, apathy, arguments, 
low morale. 

The practical consideration of time dic- 
tated the limiting of statements to 300. The 
75 best statements within each area were then 
selected. An equal number from each area 
was designated for two Preliminary Super- 
visory Inventory forms. A system for place- 
ment of items in each form was devised. The 
objective here was to intersperse the state- 
ments of each area systematically. 
Determination of Preference Indices 

The next step was to determine the pref- 
erence value for each item. Form A of the 
Supervisory Inventory was administered to a 
group of 64 supervisors. Another group of 67 
supervisors responded to Form B. The men 
were instructed to rate on a five-point scale 
how favorable it is for a supervisor to hold 
each opinion. The supervisors were asked not 
te be concerned with whether they agreed or 
disagreed with the statement but merely to 
determine how complimentary each statement 
was. 

The mean value assigned to each statement 
was designated as the preference index for 
that item. The item preference indices ranged 
from 1.32 to 4.87. The lower the value, the 
more favorable or complimentary the item. 


Determination of Item Discrimination Indices 


The population participating in the pref- 
erence index phase also took part in deter- 
mining the discriminative power of each item. 
A total of 123 supervisors were given both 
forms of the Preliminary Supervisory Inven- 
tory. These supervisors were divided into two 
groups. Each member of the first group (NV 

61) was instructed to think of the worst 
supervisor he knew—an actual person whom 
he could describe accurately. The bases for 
selecting this individual were carefully ex- 
plained. It was emphasized that the selection 
of the “poor” supervisor was not on the basis 
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of production, but rather upon such points 
as his attitude toward various aspects of in- 
dustrial life and his ineffectiveness in han- 
dling employee problems. Each member of 
the other group (NV = 62) was instructed to 
think of the “best” supervisor he knew. Again 
the bases for making this decision were care- 
fully explained. 

To obtain the discriminative power of each 
statement, the numbers agreeing with each 
item in the “best” and “worst” groups were 
compared. The statistic used was chi square. 
The chi square value, the corresponding level 
of significance, and a designation of whether 
agreement with the statement referred to the 
“high” or “low” group were noted. Those 
items with chi square values at the 1, 2, or 
5% levels of significance were considered sig- 
nificantly differentiating items. A total of 64 
statements was found to be significant at 
those levels. 


Construction of the Forced-Choice Scale 


The tetrad format was used. An item from 
each of the four designated attitudinal areas 
is represented in each group. A discriminating 
item in one area was paired with a nondis- 
criminating item of equal preference value 
but from another area. The preference value 
of the nondiscriminating item differs by less 
than 0.10 from that of discriminating item. 
A tetrad was formed by combining, for ex- 
ample, a pair of items with positive prefer- 
ence value representing Areas I and Ii (one 
of which discriminated) with a pair of nega- 
tive preference values representing Areas III 
and IV (one of these discriminating). This 
procedure was followed in all tetrads. A dif- 
ference of 1.50 in preference values of com- 
bined pairs was adhered to. 

A summary of the characteristics of the 
method employed for constructing the scale 
follows: 


Each area is represented in each tetrad. 
Each area is represented by a discrimi- 
nating item with positive preference 
value an equal number of times. 

Each area is represented by a discrimi- 
nating item with negative preference 
value an equal number of times. 

Each area is represented as a nondis- 
criminating item an equal number of 
times. 
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5. Each area is paired with every other 
area an equal number of times. 


Below is an example of one tetrad. The 
area and preference value for each statement 
are noted. Discriminating items are marked 
with an asterisk: 


*Area I 2.24 Promotions are made only on 
merit. 

Area II 2.26 The best-liked supervisor is 
the one who sticks up for his men at 
all times. 

*Area III 3.86 The slow worker is a safe 
worker. 

Area IV 3.87 The best way to handle an 
employee who often asks for a raise is 
to tell him to be patient until it comes. 


Scoring the Inventory 


A score of 1 is given for each discriminat- 
ing item selected. It is possible to obtain a 
score from —2 to +2 for each tetrad. A +1 
score is achieved by selecting a discriminat- 
ing item as “most agree’ which had been sta- 
tistically determined to indicate the “good” 
supervisor; or by selecting a discriminating 
item as “least agree” which had been found 
to indicate the “poor’’ supervisor. A score of 
—1 is assigned in the selection of a discrimi- 
nating item in the wrong direction, i.e., the 
selection as most agree of an item which had 
indicated a poor supervisor. Selection of non- 
discriminating items receive no score. The 
possible range for the entire inventory is from 
+48 to —48. In addition, for each 
of the four attitudinal areas are obtainable. 
Thus, not only was a total score obtainable 
but also area scores. 


scores 


Determination of Validity and Reliability 


The final scale was administered to 73 su- 
pervisors employed in another plant in the 
Philadelphia area. This industrial concern 
produced machine parts. Although the ma- 
jority of the supervisory Ss were from pro- 
duction departments, each department of the 
plant was represented. The characteristics of 
this validation group were similar to those 
found in the inventory-construction group. 
Ratings were obtained on all supervisors par- 
ticipating in the validation project. Scores for 
each of the four areas and a total score were 
obtained. 

Analysis revealed that each set of scores 


approximated a normal distribution. Hence, 
the validities of the attitude inventory scores 
were computed using the Pearson product- 
moment correlation. 

Interrelationships of area scores were ex- 
amined by correlating scores in each area 
with the scores in every other area. 

An estimate of the reliability of the atti- 
tude scale obtained by correlating scores on 
the odd-numbered tetrads with those on the 
even-numbered tetrads is .84 (.91, corrected 
by Spearman-Brown). 

The Criteria 

For purposes of validating the final forced- 
choice Supervisory Inventory, ratings were 
made by two plant psychologists on the 73 
participating supervisors. The ratings covered 
two broad areas: (a) attitudinal character- 
istics and (6) administrative and production 
skills. For each rated characteristic the forced- 
distribution method was employed. The atti- 
tudinal ratings ranged from 12 to 44 with a 
mean of 26.90 and a standard deviation of 
7.78. The administrative and production rat- 
ings ranged from 8 to 39, with a mean of 
24.88 and a standard deviation of 7.25. 
Validity of the Inventory 

Pearson product-moment correlation coeffi- 
cients were obtained between: 

1. Inventory scores (area and total) and 
their counterparts on the “attitude” cri- 
terion. 

Inventory scores (area and total) and 
their counterparts on the “administra- 
tive and production” criterion. 


The results (Table 1) indicate that the total 
score is closely related to attitudinal factors 
and shows some relationship to production 
and administrative aspects of supervision. 
They indicate also that there is a consistent 
and significant relationship between scores in 
each attitudinal area and an outside criterion 
of attitude. There is also noted significant re- 
lationships between inventory area scores and 
the administrative-production criterion. 
Table 2 presents the intercorrelations be- 
tween inventory area scores and the correla- 
tions between area scores and the total score. 


CONCLUSIONS 


The Supervisory Inventory holds out con- 
siderable promise as both a diagnostic instru- 
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TABLE 1 


MEANS, STANDARD DEVIATIONS, AND RANGES OF INVENTORY SCORES; 


AND THEIR 


CORRELATIONS WITH THE CRITERIA (NV = 73 


Correlations 


Standard 


Area 
Management 
Supervisors 


Employees 


~I = = NO 


Human Relations 
Total Score 


“Ie NN NS 


ment and as a selection device. Although fur- 
ther refinement and validation are necessary, 
the present validities obtained are very en- 
couraging. The applicability of the forced- 
choice rationale and format is demonstrated. 
Its susceptibility to deliberate faking needs 
yet to be tested. 


SUMMARY 


Human relations knowledge and skill have, 
for some time, been recognized as a major 
component of effective supervision. Many at- 
tempts have been made to measure this char- 
acteristic. In the present instrument, the 
forced-choice technique has been invoked. 
This has been done in the hope that bias and 
fakability, inherent in most inventories, could 
be minimized. 

The participating Ss were supervisors in 
two large industrial plants in the Philadelphia 
area. Items for the forced-choice scale were 
obtained through the use of a sentence-com- 
pletion blank. Four major attitudinal areas 
emerged when the items were classified. These 
areas were designated as: (a) “Management,” 
(6) “Supervisors,” (c) “Employees,” and (d) 
“Human Relations Techniques.” The mean 
scale value of social acceptability assigned an 

TABLE 2 

INTERCORRELATIONS BETWEEN INVENTORY AREA 

SCORES AND CORRELATIONS BETWEEN AREA 
SCORES AND THE TOTAL SCORE 
73) 


Em 
pl yyees 


Manage 
ment 


Super 
visors 


Total 


Area 


Management 
Supervisors 
Employees 
Human Relations 


Deviation 


- ON WN 


—. 
x 


Adm.-Prod 


39 
46 
40 
15 35 


2 to 38 61 33 


item by the participating supervisors was its 
preference index. The discrimination index 
was obtained by having supervisors indicate 
agreement or disagreement with each item in 
accordance with how the “worst” (or “best’’) 
supervisor they knew would respond. Chi 
square values represented the discrimination 
indices. Systematic procedures were used for 
pairing signiticantly discriminating items with 
nondiscriminating ones, for combining pairs 
into tetrads, for ordering the items within 
each tetrad, and for placing the tetrads in the 
final scale. The result was a forced-choice in 
ventory of 24 tetrads. 

The inventory was validated on a new 
group of supervisors on whom ratings had 
been made by their superiors. The areas 
covered by these ratings were (a) attitudinal 
characteristics similar to those measured in 
the inventory and (6) productive character- 
istics. 

An odd-even reliability estimate for the in- 
ventory was .91. A significant relationship 
was noted between total inventory scores and 
the rated characteristics of attitude (Pearson 
r = .61). Anr of .33 was obtained in relating 


total inventory scores to ratings of produc- 


tivity. Correlations between area scores and 
corresponding area attitudinal ratings ranged 
from .45 to .55; between area scores and their 
counterparts in productive characteristics, 
from to .46. Intercorrelations between 
area scores were .26 to .32. 


35 
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CROSS-VALIDATION OF AN IBM PROOF 


MACHINE 


TEST BATTERY’ 


JOHN B 


HARKER 


Cole and Associates, Boston, Massachusetts 


During the period 1949-1951, a committee 
of bankers under the auspices of the Ameri- 
can Bankers Association conducted 4 study 
to investigate the usefulness of aptitude tests 
in the selection of clerical workers for ‘banks. 
In this, they had the assistance of the Psycho- 
logical Corporation as consultant together 
with psychologists from three business organi 
zations and universities. The results of 
the work of this committee are fully reported 
in Clerical Testing in Banks (American 
Bankers Association, 1952). This report in- 
cluded a number of studies supporting the use 
of aptitude tests in the selection of clerical 
workers for banks, including several based on 
a work sample designed for use with the IBM 
proof machine. Some very encouraging rela- 
tionships were reported between several tests 
and the total time required by an operator 
to complete the Proof Machine Work Sample. 
However, 


six 


studies, 
cross-validation 


these were all concurrent 
that a _ following 
clearly indicated. 


so was 


PROBLEM 


This paper is a report of the experience of 
one large bank in the determination of mul- 
tiple cutoff scores for the selection of proof 
machine operators based on a concurrent va- 
lidity study employing the work sample, and 
a subsequent follow-up study on a group of 
experienced operators tested before hiring and 
training. The follow-up study has added sig- 
nificance because of the use of actual produc- 
tion data as the criterion. While the work 
sample possesses face validity, it is not as 
difficult as the actual job. For example, in 
the work sample all dollar amounts are very 
clearly written on the specimen checks; the 
sorting of the checks is done by clearly writ- 
ten compartment numbers and the operator 
is not slowed by a search for, and the correc- 

1The studies reported in this paper were carried 


out while the author was associated with the First 
National Bank of Beston 


tion of, errors. By contrast, in the normal 
situation, the operator must mentally classify 
a signature, bank name, or transit number 
into the ting system; read poorly written 
numerical amounts, and locate and correct 
errors at the expense of production. In spite 
of these differences, actual production and 
work sample scores were reported as correlat- 
ing .58 in two separate studies (American 
Bankers Association, 1952). While such va- 
lidity is high enough to justify the use of the 
work sample as a criterion when other data 
are not available, it is conceivable that the 
selection standards so determined might differ 
from standards based on actual performance. 


METHODOLOGY OF INITIAL AND 
Cross-VALIDATION STUDIES 


The initial study was carried out in 1951 
group of 53 proof machine operators with a mini 
mum of six months of full time experience. The 
work sample was administered according to standard 
instructions using actual machine operating time as 
the To be certain that length of service (after 
the six months of training) had no bearing on 
the operators’ speed scores, an analysis was carried 
out relating length of service to speed. It was found 
that the average time to complete the work sample 
was 16'40” 


on a 


score 
first 


for 24 months of serv- 
ice; 16'52” for 15 with 31 to 54 months of 
and 16'31” for 14 with 55 to 106 
months of full time service as a proof machine op- 
erator. Thus length of 
variable 

The reliability of the work sample was found to 
be .89 by applying the Spearman-Brown prophecy 
formula to the average correlation between the time 
required for the completion of each of the three 
parts of the work sample. 

The work sample time scores were then correlated 
with scores on the following tests which were ad- 
ministered concurrently: the Short Employment 
Tests (SET) CA-1 (Clerical Aptitude), V-1 (Vo 
cabulary), N-1 (Arithmetic), published by the Psy- 
chological Corporation, and the Hay Number Per- 
ception Test (Form A), the Hay Number 
Completion Test (Form B), published by Aptitude 
Test Service, Swarthmore, Pennsylvania 

The cross-validation was carried out in 1956 as 
a follow-up project on clerks who had been tested, 


cases with 6 to 3( 
cases 
service ; cases 


service is not a significant 


Series 
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TABLE 1 


CORRELATIONS BETWEEN TEST SCORES AND PROOF MACHINE CRITERIA 


Concurrent Study 


Work Sample 


Operating Time 


lest 


Tests 
SET C 
SET V 
SET N 
No. Perception A 
No. Series B 


Criteria 


1 


\ 
1 
1 


16’40” 
reliability. 


hired, and trained subsequent to the initial validation 
project in 1951. During the period between the two 
studies there had been a tight labor market and, 
fortunately for this study, it was not possible to 
completely enforce the test selection standards deter- 
mined by the first study. 

The criterion for the follow-up study was devel- 
oped by averaging the number of items handled 
during the busiest hour of each day for a period of 
one month. Only operators doing comparable work 
and who worked at least 15 of the 20 days in the 
month were included. A total of 30 operators quali- 
fied. The corrected split-half reliability of this cri- 
terion was found to be .79. The length of service 
of this group as full time proof machine operators 
ranged from 8 months to 25 months and, as with 
the work sample, a study of possible production 
differences due to length of service was not signifi- 
cant. The production data obtained were then cor- 
related with the results of the that had 
administered anywhere from three 
earlier. 


tests been 


one to years 


COMPARATIVE RESULTS 


Table 1 presents the correlations between 
the five tests and the two criteria along with 
the mean and standard deviations of the two 
groups on the tests. Analysis reveals that in 
the original study, the SET CA-1, the Hay 
Number Perception A, and the Hay Number 
Series B, each had correlations of .50 or bet- 
ter, suggesting that they might be good pre- 
dictors of success if used as selection tools. 
The SET N-1 also correlated significantly at 
31, but the SET V-1 at .10 was not signiii- 
cantly related to the criterion. 

In the cross-validation the correlation of 
the CA-1 dropped from .50 to .22, the Num- 


Follow-up Study 


Actual Production 
Items per Hour 


Test 


ber Perception dropped from .62 to .35, 
and the Number Series Completion changed 
slightly from .54 to .47. The N-1 remained 
at .31 while the V-1 remained below the level 
of significance at .15. 

In addition to the changes in the correla- 
tions, there were marked differences in the 
mean test scores and the standard deviations 
between the two groups. Some of this must 
be attributed to chance selection factors, but 
a separate study of test-retest scores for these 
five tests showed that scores on the CA-1 and 
the Number Perception A were strongly sus- 
ceptible to experience of a clerical nature. 
Thus the higher mean scores on these tests 
of the clerks in the first group reflect their 
status as fully experienced clerks at the time 
of testing. Scores of the experienced operators 
deviated from the mean almost as much as 
the inexperienced, so it may be assumed the 
tests retain their significance so long as appro- 
priate norms are used for the individuals 
being studied. 

Following Hay (1950) the possibilities of 
multiple cutoff scores were studied and their 
effect determined in the prediction of work 
sample speed and also actual production. In 
the initial study combinations of the Number 
Series Completion Test with the CA-1 or the 
Number Perception Test were found to be 
equally effective in predicting work sample 
performance. Further work showed that a me- 
dian score on any two of the three tests was 
the best predictor. While the CA-1 failed to 
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TABLE 2 


EXPECTANCY TABLE FOR WORK SAMPLE SPEED SCORES 


Passing Lowest 4 
scores 

Tests Achieved N % 

No. Series B 

CA-1 

No Perception A 


17 
14 
On any 37 


Fail all 3 ' 73 


On all 3 


On any 2 
1 


Totals 34 


correlate significantly in the cross-validation, 
it is entirely possible that a chance distribu- 
tion of scores worked against it in the sample 
studied. Analysis of the scatter diagram 
showed that high scorers tended to be high 
producers but the test did not clearly iden- 
tify the low producers among those with low 
scores. In view of its early success it has been 
included in the study of multiple cutoff scores 
for whatever it might add. 

Tables 2 and 3 present multiple cutoff data 
for the two studies in the form of expectancy 
tables for work sample speed scores and 
actual production in relation to the attain- 
ment of median scores or better on all three, 
two of three, one of three, or none of the 
three tests discussed. In each table, the me- 
dian scores used were those of the group in 
the study, rather than a generalized popula- 
tion mean that would not allow for the dif- 
ferences in experience. The best results appear 
to be achieved by using all three tests to- 


Work Sample Speed Scores 


Middle 4 Total 


N q ] N v/ 


0 


32 


gether with the requirement that the subject 
pass any two of the three at the group 
median. 

In Table 2, dealing with the initial valida- 
tion, better than half (58%) of those quali- 
fying on all three tests are in the top third 
in performance. The percentage of success is 
almost as high (50%) in the group who pass 
any two of the three tests. Practically half of 
the total group (26 out of 53) qualify at this 
level so that the standards are not too high. 
The percentage of successful operators drops 
off sharply where only one test is passed with 
the bulk of them being recorded in the middle 
or low thirds. When all three tests are failed, 
73% of the cases are in the bottom third on 
the work sample speed scores. 

Almost identical results are achieved in the 
cross-validation study as shown in Table 3. 
Applying the standard of passing any two or 
all three at the group median, we find that 
9 of the 17 (53%) who so qualify produced 


TABLE 3 


EXPECTANCY TABLE FOR PROOF MACHINE PRODUCTION 


Lowest 4 


Passing To 776 
Scores 

Tests Achieved N o// 

No. Series B 

CA-1 


No. Perception A 


On all 3 12 
On any 2 ‘ 3. 
On any 1 44 


Fail all 3 


Total 


Average Hourly Production 


(in items per hour) 


Middle 3 
783-902 


Top } 
907 and Up Total 


( 7 7 ( 
( 4 4 c 
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in the top third, while 6 of 13 (46%) who 
failed to qualify were in the bottom third. 
Top honors went to those who passed all 
three of the tests, with 5 of 8 producing in 
the top third, but this would be too highly 
selective in most labor markets. 

In actual production, the 15 who passed 
two or three tests averaged 890 items per 
hour while those who passed only one or 
failed all three of the tests averaged only 790 
items per hour. This means the superior group 
processed almost 13% more items than the 
slow group. Clearly, in any installation where 
a considerable number of people and ma- 
chines are involved, this would represent a 
considerable saving. 


DISCUSSION 


While the two studies have shown that 
multiple cutoff scores are a practical and 
sound way to utilize test results, interpreta- 
tions of the variations in the correlations need 
to be made. It is believed that these are due 
mostly to the differences in the criteria. The 
work sample is essentially a speed test in the 
mechanical operation of the machine, while 
actual production is influenced by the accu- 
racy of the clerk. The first study suggests that 
clerical perception of numbers and names is 
very significant in speed of operating the ma- 
chine, while the second study suggests that 
the most important factor in regular work is 
numerical reasoning ability as measured by 
the Number Series Completion Test. This is 
felt to be a more abstract intellectual function 
than arithmetic skill which has a lower level 
of significance in this study. In normal opera- 
tion, a clerk will to two errors 
per thousand items handled, so “down time” 


make one 
while iocating errors can be a significant fac- 
tor in production. Conceivably, the job re- 
quires a person who is quick in an abstract 
numerical sense and who through logic would 
locate and correct errors quicker than some- 


Harker 


one who was merely good in arithmetic or 
perceptual speed. 

Following this reasoning, a combination of 
high abstract numerical ability with speed in 
the perception of numbers and names would 
appear to be required for better than average 
performance on the IBM proof machine. Per- 
ceptual skill will lead to speed and accuracy 
in listing items while numerical intelligence 
will tend to minimize lost time solving prob- 
lems related the work. Where all three 
abilities are present, a superior operator is 
almost assured, and if two of the three are 
present, good performance can probably be 
counted on. 


to 


SUMMARY 


In 1952 the American Bankers Association 
reported on a number of validation studies in 
banks including correlations between total 
time on an IBM proof machine work sample 
and test scores obtained concurrently on a 
number of experienced operators by several 


different banks. One large bank which con- 


tributed to this report undertook a follow-up 
cross-validation study using actual production 
data. Comparison of the original work sample 


study by this bank with the cross-validation 
revealed that the predictors determined by 
the first study were essentially accurate. A 
combination of numerical reasoning ability 
with perceptual speed and accuracy appears 
to be involved in successful operation of the 
machine. A multiple test battery with the re- 
quirement that two of the three tests be 
passed at the group median or better had a 
very good selective efficiency against both 
criteria and is recommended to increase pro- 
duction by almost 13%. 
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A MULTIPLANT FACTOR ANALYSIS OF EMPLOYEES’ 
ATTITUDES TOWARD THEIR COMPANY 
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Most published factorial studies of em- 
ployee attitudes such as those by Ash (1954), 
Baehr (1954), and Wherry (1954), Gordon 
(1955), and Wilson, High, Beem, and Com- 
rey (1954) have dealt with global types of 
attitude surveying instruments which had 
been designed to “cover the waterfront” of 
employee attitudes. For example, SRA Em- 
ployee Inventory, which was the instrument 
used in the studies Ash, Baehr, and 
Wherry, yields category scores for each of 
14 separate aspects of the work situation. 

In this study the factorial content of an 
attitude questionnaire designed specifically to 
assess employees’ attitudes toward their com- 
pany was investigated. The purpose of the 
investigation was two-fold: (a) to determine 


of 


whether responses to an attitude question 


naire carefully designed to measure a specific 
attitude, i.e., Attitude toward the Company, 
were unifactorial or multifactorial, and (0) 
if responses were multifactorial, to investigate 
the nature of the factors present and their re- 
lation to the results of other factorial studies. 


PROCEDURE 


Questionnaire 
study is entitled 


The questionnaire employed in this 

{bout Your Company. It 
20 dichotomously scored items. Each item survived 
two internal item analyses during the 
development of the questionnaire. One of these anal- 
the usual type of internal consistency 
item analysis based upon the data from eight plants; 
the other designed to group of items 
which was internally consistent for both male and 
female respondents. In addition, the items had pre- 
viously passed screenings for communicability, maxi- 
mum range of difficulty, brevity, and face validity 
The coefficient internal consistency (split-half 
method) for the questionnaire is .92 (Storey, 1955) 
In administration the questionnaire is mailed to the 
respondents’ homes. The questionnaire is anonymous 
Completed questionnaires are mailed by the respond- 
ents directly to the Occupational Research Center, 
Purdue University. 

The sample. Questionnaires were mailed to all 
production employees of 10 different plants located 
in seven central Indiana cities. The analysis 
is based upon the 735 questionnaires (approximately 


contains 
consistency 
yses 


was 


was select a 


of 


factor 
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50% of those sent) which were returned the 
respondents to Purdue University. Industries repre 
sented and the number of respondents employed in 
each type of industry are as follows: communica- 
tions (230), electrical equipment manufacture (253), 
metal working (160), and ceramic insulation mate 

rial manufacture (92). In all companies the respond- 
ents occupied jobs covering a wide of skill 
levels; approximately 25% of the respondents were 
women. Since the of a factor analytic 
such as this is the identification of a factorial struc- 
ture which is as stable and which permits as broad 
a generalization as the limitations of the total 
allow, all responses considered 

regard to company identification, skill level 
job which the respondent held, or the 
respondent. If specific 

companies, skill levels, 
study was not designed 


by 


range 
€ 


goal study 


data 
without 
of the 
of the 
between 
this 


735 were 
sex 
differences 
are 
them 


factorial 


or sexes present 


to reveal 


ANALYSIS OF QUESTIONNAIRE RESPONSES 


Tetrachoric interitem correlations were 
computed using the method reported by Jen- 
kins (1955). Interitem correlations in the 20 

20 matrix thus formed ranged from +.43 
to +.81, with a median of +.65. The matrix 
was factored by the complete centroid method 
and three factors were extracted. After ex- 
traction of these factors the mean 
interitem correlation was +.003 (of course, 
the method factoring employed should 
yield a mean residual correlation of .00). The 
range of the residuals was from 15 to 
+.10. Seventy percent of the residuals lay 
between =+.05. 

The three were then rotated or- 
thogonally to simple structures. The items 
with their rotated factor loadings are listed 
in Table 1. 


residual 


of 


factors 


INTERPRETATION OF THE FACTORS 

The orthogonal rotation shows the exist- 
ence of a large General Factor (G) and of 
two Group Factors (A) and (B). These fac 
tors will be discussed separately. 

The General Factor (G). This is by far 
the dominant factor in this study. This is 
demonstrated by the fact that all 20 items 
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TABLE 1 


QUESTIONNAIRE ITEMS WITH THEIR ROTATED Factor LOADINGS 


Factor 
Item 


Would you say that the company is usually hardboiled and tough with its 
employees? 
Do you like to have your friends know where you work? 
Considering everything about the company, are you fairly well satisfied 
with it? 
Do you think your company has more dissatisfied employees than most 
companies? 
Is there any other company around here where you would rather work? 
If you were starting over again, would you probably go to work here? 
Is there a friendly feeling in your company between the employees and 
management ? 
Would you say that your company is a better place to work than most 
around here? 
Does the company sometimes interfere with your personal rights? 
Does the company ever take advantage of the employees? 
Do the top people respect your rights as a person? 
If you were in real trouble would you probably get a square deal from the 
people at the top? 
Do you feel that the top men in the company are trying to do the right 
thing? 
Do you have confidence in the business judgment of top management ? 
Do the people at the top pay enough attention to ambition and effort? 
Do you think the company is really trying to improve relations with its 
employees? 
Does management usually keep you informed about the things you want 
to know? 
17. Does your company offer enough chance for self-improvement and learning? 
16 Is your company a good one for a person trying to get ahead? 


18 Do employees usually have to fight for what they get in your company? 


h they n the questionnaire to facilitate retation « 


* Items are listed out of the order in whi 
factorial content. 


order of magnitude the items with factor 
loadings of .30 or highér on this factor are 
and 11. In this factor em- 


have their highest loading on this factor. It 
seems clear that this factor represents the 


employees’ general attitude or bias toward 4, 15, 7, 6, 9, 


their company. It may well be another ex- 
pression of the General Factor found by 
Wherry (1954) in his orthogonal rotation of 
the factoring of the SRA Employee Inven- 
tory. 

Group Factor (A). In interpreting both of 


the group factors, only factor loadings of .30 


and higher were considered as _ significant. 
Group Factor (A) was called Respect for 
Personal Rights. Note in Table 1 that in 


ployees’ perceptions of their freedom within 
their plant and of management’s concern for 
their personal welfare are dominant. This fac- 
tor appears to be similar in content to the 
factor called Consideration in Halpin’s factor 
analyses of the leadership behavior of aircraft 
commanders (1955). 

Group Factor (B). Group Factor (B) was 
called Opportunity for Self-improvement. The 
items with significant loadings on this factor, 
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in order of magnitude, are 19, 13, 14, 17, 7, 
9, 16, 11, and 18. Items with significant 
factor loadings are concerned with the em- 
ployees’ perceptions of opportunities for 
self-development, promotion, and increased 
knowledge of company operations. 

Note that Item 13, which might logically 
be expected to have a significant loading on 
Group Factor (A), instead has a significant 
loading on this factor. This fact may indi- 
cate that Factor A tends to tap employee 
reactions to past experiences in the company, 
while Factor B tends to tap perceptions of 
changes in company policies and actions. 

DISCUSSION 

The existence of three factors within the 
framework of questions which are concerned 
exclusively with the employee’s attitudes to- 
ward his company, leads one to ask if the 
total factorial structure of employees’ work 
attitudes may not be more complex than pub- 
lished factorial studies have indicated. Global 
surveying techniques may not allow enough 
overdetermination of true factorial content 
for the identification of all group factors 
which are present. 

The similarity between the factorial con- 
tent of this questionnaire and other attitude 
surveying instruments however, lend 
encouragement to the belief that the complex 
of employees’ attitudes may be reduced to a 
more manageable number of 
elements. 


does, 


factors or 


SUMMARY 


Seven hundred thirty-five production em 
ployees in 10 plants responded to a 20-item 
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attitude questionnaire concerned with attitude 
toward their company. The item intercorrela- 
tion matrix was factor analyzed. Three fac- 
tors were extracted and rotated to orthogonal 
simple structure. The dominant first factor 
represents a large general factor of general 
attitude or bias toward the company. One of 
the two group factors represents the perceived 
status of management’s regard for employees’ 
personal rights. The other group factor repre- 
sents perceived opportunities for self-improve- 
ment. 

Similarities and contrasts to other factorial 
studies in this general area were discussed. 
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In the interview situation among the vari- 
ables which it is not possible to change by 
training or practice are the personal charac- 
teristics of the interviewer. For certain types 
of interview these characteristics may become 
a bias-generating factor in the interview situ- 
ation, leading to a systematic source of error. 
Age, sex, ethnic group, and color are some 
of these major background characteristics. 
Of these variables age differences have been 
shown not to be significant (Cantril, 1947, 
p. 113). Differences in sex have been found 
to have an effect when questions concerning 
sex relationships were asked. In this case 
attitudes are expressed more freely to male 
interviewers by respondents of both sexes 
(Hyman, Cobb, Feldman, Hart & Stember, 
1954, pp. 165-166). 

The purpose of the present experiments 
determine whether interviewers be- 
longing to different ethnic groups would elicit 
significantly different responses to questions 
involving racial issues. In the first experiment 
an Oriental (J. T.) and a Caucasian inter- 
viewer (K. R. A.) questioned white respond- 
ents concerning the degree of acceptance of 
Orientals. It was predicted that the answers 
given to the Oriental would be less segrega 
tionist and nearer to what may be thought 
that 


was to 


to be “socially correct’? answers for 
interviewer. 

In the second 
i £. © 


(A. P. R.) questioned middle-class white col- 


Negro 


interviewer 


experiment, a 
and Caucasian 
lege students and property owners concerning 
the effect on property values of Negroes mov- 
ing into their neighborhood. Again it was 
predicted that the answers would be modified 
to the Negro interviewer. On the other hand 
more modified answers would be expected on 


the part of the seniors to the Negro inter- 


viewer than from the freshmen, because of 
increased ‘‘audience sophistication.” 

Little research has been done on the racial 
variable. Hyman (1954, p. 159) reports an 
experiment carried out by the National Opin- 
ion Research Center in Memphis in May 
1942. White and Negro interviewers each 
interviewed approximately 500 Negroes con- 
cerning their opinions and attitudes toward 
the war effort. The white interviewers ob- 
tained a significantly higher proportion of 
what might be termed “proper” or “accept- 
able” answers on almost all the opinion and 
attitude questions. Robinson and Rohde car- 
ried out two experiments with an anti-Semitic 
poll in 1946. They found significantly affected 
responses (tf = 2) between answers to Jewish- 


appearing and non-Jewish-appearing inter- 


viewers; and secondly, that significantly less 


anti-Jewish feeling was expressed in response 
to an indirect form of question. 


METHOD 


Experiment I. The six questions on racial matters 
were designed with Yes-No answers for a 
to-house interview. The age and sex of the inter- 
viewers differed, but, as indicated does 
not tend to bias the results, and the questions were 
not ol affected by the sex 
of the interviewer. Both interviewers stated that they 
were students when requesting cooperation 
in the project. Thus the only background variable 
under j race of the 


house 
above, age 

such a nature as to be 
college 


examination was that of the 
interviewer. An opening paragraph set the desired 
Caucasian-Oriental framework. The six questions 
dealt with the following interracial situations: (a) 
children playing together, (6) next-door neighbor 
of a different intermarriage by a friend, 
(d) embarrassment on introducing a couple of 
mixed marriage to friends, (e) medical examination 
by a physician of a different race, and (f) acceptance 
or rejection of a son who marries an Oriental when 
Fifty middle-class respondents residing in 
the same residential area were questioned by each 
interviewer 


race, (c) 


overseas. 





Effect of Interviewer’s Racial Background 
! g 


The answers of each respondent were interpreted 
as prejudiced or unprejudiced. A completely un 
prejudiced score on the six questions was zero, a 
completely prejudiced Hence there 
were seven possible scores for each subject, ranging 
from zero to six. The hypothesis was tested that thi 
race of the interviewer had no effect on the respond- 
ents’ answers 

Experiment II. Three 
two female interviewers, Caucasian and Negro, of 
approximately the same age. The questions asked 
whether the respondent had discussed (a) with the 
next door neighbor or (5) 
block the possibility of a Negro moving into the 
neighborhood, and (c) if this happened would the 
respondent expect the value of the 
down. In scoring, two points were allotted for a 
“Yes” answer, one point if the respondent would 
“rather not answer,” and zero for a “No” answer 
Again the range of the scores was from zero to six 
Each interviewer questioned 25 freshmen, 25 seniors, 
and 25 homeowners 
random from passersby on the Univer- 
sity of California, Berkeley, campus 


score was Six 


questions were asked by 


with anyone else in the 


pr ert oo 
property to g 


The freshmen and seniors were 
selected at 


RESULTS 


Experiment I. A t test for significance be- 
tween the means for the two interviewers 
showed for a one-tailed test a probability of 
< .00003. 

For the Caucasian interviewer, prejudice 
scores ranged from zero to five with a mean 
of 1.20, 19 scores being completely unpreju- 
diced. For the Oriental interviewer, prejudice 
ranged from zero to two, the mean 
was 0.60, there being 30 prejudice-free scores. 
The results are than these 


scores 


more divergent 


figures would indicate because the first ques- 
tion, concerning children playing together, 
was not sufficiently sensitive. 


No prejudice 
scores were recorded on this question. 
Experiment II; t tests for significance be- 
tween the means (all one-tailed tests) gave 
the following results: (a) For freshmen the 
results were not significant. (5) 
there was a highly significant 
< .00003. (c) For homeowners there was a 
significant result at p< .05. (d) Between 
the 75 respondents of the white interviewer 
and those of the Negro interviewer the result 
was again highly significant at p < .0005. 
(e) The comparison of the college freshmen 
with the college seniors in their answers to 
the Caucasian interviewer showed an insig- 
nificant tendency for seniors to state they had 
discussed racial issues more than freshmen at 


For seniors 
result at p 
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p < .10. (f) The same comparison with the 
Negro interviewer shows only a slight tend- 
ency to reserve among the seniors at p < .10. 


DISCUSSION 


In both studies significantly more “socially 
acceptable” answers were given to the Orien- 
tal and Negro interviewers. The one exception 
to this was in the group of 25 freshmen, who 
showed no significant difference between their 
answers to the two interviewers. In their case 
it is possible that the questions had not much 
relevance for them. In the general case it is 
considered that the reason for the different 
answer patterns is basically due to the in- 
volvement obligations outlined by Goffman 
(1955), where the speaker tends to scale 
down his expressions in order to protect the 
“conversational bridge” even in the interview 
situation. 

Both the Oriental and Negro interviewers 
found that people were sensitive to their 
racial origin. The Oriental interviewer (Chi- 
nese) was asked particularly if she were 
Japanese, many respondents making a point 
of telling her that Orientals are respectable 
people. The Negro interviewer was told by 
some respondents “I know several wonderful 
colored people,” and “I’m all for you,” and 
similar comments. 

To both interviewers the seniors expressed 
slightly more liberal views regarding property 
value than the freshmen. But there was a 
marked increase in the frequency with which 
they admitted discussing Negroes moving 
into the neighborhood to the white inter- 
viewer, and a marked decrease to the Negro 
interviewer. Since the number who had dis- 
these issues might be expected to 

with age, the difference between 
freshmen and seniors seems to lie entirely in 
unwillingness to admit to a Negro interviewer 
that such discussion had occurred. In the 
homeowners, the attitude question was modi- 
fied as well, chiefly by an increase in refusals 
to answer the Negro interviewer. 

Some aspects of the role of university stu- 
dents may arise in this type of interview. 
They are probably considered more liberal in 
outlook, and not, as is normally assumed to 
be the case with interviewers, neutral toward 
both interviewee and topic (Dexter, 1956). 


( ussed 
increase 
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If the interviewers, then, had not been uni- 
versity trained, there might be a tendency 
for the results to be more divergent. 

A willingness to agree has been shown to 
occur much more among less educated and 
lower economic groups than among more edu- 
cated groups (Robinson & Rohde, 1946, p. 
144). In general, therefore, it might be ex- 
pected that the results would be more diver- 
gent if the interview were carried out on a 
larger scale among groups at a lower eco- 
nomic level. 


SUMMARY 


Two experiments were carried out to deter- 
mine whether interviewers belonging to dif- 
ferent ethnic groups would elicit significantly 
different responses to brief questionnaires 
involving racial issues. The first experiment 
used an Oriental and Caucasian interviewer, 
and the second a Negro and Caucasian. The 
overall results showed a highly significant 
difference in the responses to the two inter- 
viewers of the minority race. Some differences 
in answering were noted 


between college 


Athey, Coleman, Reitman, and Tang 


freshmen and seniors. It appears that more 
seniors than freshmen had discussed the is- 
sues but the seniors were less willing to admit 
this to the Negro interviewer. 

It was suggested that the main cause of the 
difference occurred because the respondents, 
due to the involvement obligations described 
by Goffman (1955), scaled down their ex- 
pressions to avoid embarrassing the inter- 
viewer of the minority race and to protect 
the “conversational bridge.”’ 
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Maier and Maier (1957) have reported 
that leaders using a “developmental” tech- 
nique for group discussion produced more 
high-quality decisions than did leaders using 
a “free” discussion technique. The free dis- 
cussion technique was described as one “in 
which the leader poses the problem, then con- 
ducts the discussion in a permissive manner 
without making value judgments, but merely 
helps the group reach agreement on a solu- 
tion.” In the developmental discussion tech- 
nique the leader breaks the problem into parts 
so that each part of the problem is discussed 
separately before the final decision is made. 

Although their results showed that the use 
of the developmental technique weakened a 
majority trend toward poor quality decisions, 
the Maiers suggested that ‘“‘a skilled discus- 
sion leader can actually reverse the 
trend.” The study to be reported here was 
designed to test whether the proportion of 
high-quality decisions reached by the devel- 
opmental discussion method could be in- 
creased when leaders were given a greater 
understanding of the method and a demon- 
stration of its use. 


PROCEDURE 
The Problem 


As in the initial study 
here was the “Case of 


(1957), the problem used 
Viola Burns.” In the case, 
Viola is a young office worker who is being consid 
ered for promotion to a position as private secretary 
to a sales executive. A general description of Viola, 
an interview between Viola and the personnel man 
ager, and a second interview between Viola’s boss 
and the personnel manager provided information to 
all group members about: (a) Viola’s personality, 
appearance, intelligence, relations with other work 
ers, job duties, and job performance; (6) her boss’ 
favorable attitude toward her; (c) a description of 


1The research reported here was supported by 
USPHS Grant No. M-2704 from the National In- 
stitute of Mental Health, United States Public 
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the new job and the importance the new boss 
attaches to having it properly filled; and (d) Viola’s 
indecision about accepting the new offer. This in- 
decision serves as the basis for the problem given 
to the groups: Should Viola be encouraged or dis- 
couraged from taking the new job? In order to rule 
out the possibility of conflicting interests, the deci- 
sion to encourage or discourage Viola was made 
first from the point of view of the company and 
then from Viola’s point of view. Although a majority 
of people usually vote to encourage her from both 
points of view, the data indicate that her 
personality and ability to relate to other people 
would make her unable to do the job. Thus the 
decision to “discourage Viola from accepting the 
new job” was considered to be of higher quality 
than the one to “encourage” her. Training in the 
developmental technique was expected to 


really 


increase 


the proportion of group members under their trained 
leaders who vote to discourage Viola over the pro- 
portion who voted that way under untrained devel- 
Maier and Maier 


opmental leaders in the original 
study. 


Leaders 


Twenty-two graduate students in an advanced 
course in supervisory training were trained as leaders 
in the developmental discussion technique. All but 
one of the 22 were men and they represented a 
variety of fields of specialization. Since the experi- 
ment was conducted toward the middle of the 
semester, the leaders had all received general training 
in the leadership methods of group decision which 
was about equivalent to that received by the leaders 
in the initial Maier and Maier study. Ten of the 22 
had also taken a previous course in which they 
received a full semester’s training in group decision, 
discussion methods, and management techniques. In 
that course, these 10 also learned that the authors’ 
preferred decision to the Viola Burns problem was, 
“discourage.” The results of these 10 leaders, there- 
fore, will be presented separately and will represent 
both the effects of extended training and a knowl- 
edge of the preferred answer to the problem 


Subjects 


Each of the 22 students recruited, from outside 
friends and acquaintances, two groups of three peo- 
ple and led each group separately in a developmen- 
tal discussion about the Case of Viola Burns. A to- 
tal of 44 groups or 176 Ss was obtained. The votes 
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of one group were not returned in usable form and 
three group leaders failed to record their own votes, 
leaving 88 Ss with leaders previously unexposed to 
the case and 79 with leaders who knew the desired 
answer. 


The Developmental Discussion Method 


The developmental technique was devised to im- 
the quality of group decisions by insuring 
the systematic and simultaneous discussion of vari 
ous aspects of the problem within the framework of 
a permissive free discussion. Thus, in the Maier and 
Maier study (1957), the instructions for the devel- 
opmental leaders were added to those given to the 
free discussion The latter told 


prove 


leaders were 

You are meeting to decide whether or not Viola 
should, in the next interview, be encouraged or 
discouraged about taking the new job. The case 
as presented gives all the known facts. This in- 
struction sheet tells you how to conduct this 
meeting. In general, 

A. Try to get everyone to voice their views and 
to give reasons for their Encourage inter- 
action of ideas. 

B. Do not impose your views on the group. 
Be as permissive as you can 

C. See if you can get agreement on the 
mendations made. 

D. Get a final vote on the 
be ready to report them to the class 


ideas. 


recom 


recommendations and 


Lead the discussion to decide the following 


1. From the point of view of the good of the 
recommend that Viola be (dis- 
taking) (encouraged to take) the 


company, we 
couraged from 
new job. 
From the point of view of 
fare, we recommend that she be (discouraged from 
taking) (encouraged to take) the new job 
In addition, the developmental leaders were in- 
structed to use the following procedure 


Viola’s best wel 


To assist in making the final decision, obtain 
unanimous group decisions on each of the prelimi 
nary problems below 


Problem 1. Develop a list of Viola’s activities 
on her present job 

Problem Grade Viola’s proficiency on 
with letters A, B, C, D, or E, and write 
after the activity 

Problem 3 Viola 
would be expected to perform on the new job 

Problem 4. Grade how group thinks 
Viola will do on each 

Problem 5. Select the three activities Viola’s new 
boss will consider most important for the success 
of his office 


each 
the grade 


Develop a list of activities 


well vour 


After completing these five problems the groups 


were to arrive at decisions concerning the above 


Norman Maier and 


Richard Hoffman 


two issues, corresponding to those decided upon 
by the free discussion groups 

Votes were obtained from questionnaires com 
pleted individually and privately by each S fol 
lowing the group discussion. Two questions were 
asked regarding the decisions. 


1. From the point of view of the good of the 
company, I recommend that Viola be (check one) 
discouraged from taking the new job 
encouraged to take the new job. 
From the point of view of Viola’s best wel 
fare, I recommend that she be (check one) 
discouraged from taking the new job 
encouraged to take the new job 


Training in Developmental Discussion 

Training in the developmental technique for the 
22 leaders consisted of an hour-and-a-half session in 
which one of th¢ (LRH) conducted the 
developmental procedure with the entire 
through the solution to Problem 2 in the 
list. This consisted of a 


authors 
ciass 
above 
reading of the case mate- 
listing and grading of Viola’s activities 
present job. During the demonstration the 
answered questions procedure and 
disagreements which 

These were don 
Emphasis was placed on ob 
Viola’s present job 
member of an office 
necessary for the 
which “best fit’ the 
sense of the group without destroying the permis 
The trainees were given 
no indication of the answer preferred by the Es and 
were asked not to discuss the 
or with 


Since 


rials and a 
on her 
trainer about 
different 
within the 
principally by example 
taining: (a) 
activities, including being a 
group, (6) lists of th 
new job, and (c) 


methods of resolving 


might arise groups 


extensive lists of 
activities 
evaluations 


siveness of the discussion 


case among themselves 
who was familiar with it 
almost all the 12 leaders who had not had the 
case previously asked the Es what the correct an- 
that this 
practice was given the 


anybody els« 


swer was, we can 
complied with. No 
in conducting this 
were told to review 
to make 
difficulty 


assume request was 
trainees 
although all 
all the steps in the pr cedure 


that they conduct it without 


type ol discussion, 


sure could 


RESULTS 


The number and proportion of votes to 


encourage or discourage Viola from taking 


the new job in the interest of the company 
and in her own interest are reported in Table 


three different 
training in the developmental 
method beyond the general human relations 
training of the course: (a) Instructions only 
—leaders in the Maier and Maier study 
(1957) who received a dittoed sheet of in- 
structions containing the steps in the devel- 


1 for Ss under leaders with 
amounts of 
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TABLE 1 


EFFECTS OF TRAINING 


Viewpoint In 


Decisior 


Interest of Encourage 


Company Discourage 


Interest of 


Viola 


Encourage 
Discourage 
or the two 


<x 2 chi squares 
1 the interest of the 


values f 
"No 
all 2 


* Chi square 
both viewpoint 
groups (2) and (3 


viewpoints 


bet 


opmental procedure; (6) Instructions and 
Demonstration—leaders in the Advanced 
Supervisory Training course who received the 
dittoed instructions and a demonstration of 
their use; and (c) Instructions, 
tion, and Management Course—leaders who 
besides the dittoed instructions and the dem- 
onstration had received additional training in 
group decision methods in the Psychology of 
Management course and had been told the 
preferred answer to the case. The percentage 
of people who voted to discourage Viola from 


Demonstra- 


both points of view increased successively 
under leaders from Groups 1 through 3. Thus, 
giving the leaders a better understanding of 
the developmental discussion method resulted 
in a greater proportion of group members 
voting for the higher quality decision. The 


significance of the chi square among 
training groups from both points of view is 
accounted for largely by the sharp increase 
in the proportion of discourage votes for 
both groups under the trained leaders as com- 
pared to the groups in the Maier and Maier 
study (Group 1). The differences between 
the two groups of trained leaders, though 
evident, were significant only for the votes 
in Viola’s interest. 

Since the votes in Table 1 include those 
of the leaders, we wondered whether the dif- 
ference between the two trained groups was 
an artifact of the fact that the leaders who 
had had the case probably voted for the pre- 
ferred answer. A comparison of the votes of 
the members only still supports the higher 
quality of decisions under leaders who knew 


tests 


structions Only 


are 29.38 and 25.06, respectiv 


ween groups are signifi 


ON DECISIONS REACHED 


Type of Training of Developmental Leaders* 


Instructions 
Demonstration and 
Management Course 


Instructions 
and 
Demonstration 


(1) 
00.. 
(39.7 


(2) 
(38.6° 
61.4° 


117 


‘é 
117 


‘/ 


60.: 44.3° 
39.7 (55.7° 


ly, both significant at the .01 le 


the .05 level 


Within 


votes ot 


vel of confidence 
confidence, except between the 


the problem and had had the greater amount 
of training, but the difference again is only 
significant for the votes in Viola’s interest. 
Thus, in general, the quality of decisions was 
increased markedly by giving the leaders a 
demonstration of the developmental discus- 
sion method and increased somewhat more 
by familiarity with the problem and more ex- 
perience with the group decision method. 

A comparison was made also of the results 
of the first and second sessions for both sets 
of trained leaders but none of the differences 
is significant. One experience in the conduct 
of a developmental discussion without review 
or coaching was not in itself sufficient, there- 
fore, to significantly improve the ability of 
these leaders. 


DISCUSSION 


The results of this study provide striking 
evidence for the power of the developmental 
discussion to improve the quality of group 
decisions. Whereas in the Maier and Maier 
study the use of the developmental procedure 
by untrained leaders served to significantly 
weaken a majority trend to encourage Viola, 
a demonstration of the first two steps of the 
technique was sufficient to permit leaders to 
reverse the trend so that a majority voted to 
discourage Viola. 

Although one would not ordinarily expect 
such a marked training effect from a mere 
demonstration, it is clear that this particular 
demonstration accomplished several things— 
things which permitted these leaders to use 
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the developmental method more effectively 
than did the leaders in the initial study. 


1. The demonstration showed the leaders 
how to start the discussion of the problem 
following the reading of the dialogues and 
how the several steps in the developmental 
procedure should be meaningfully fitted into 
the overall statement of the problem of 
whether Viola should be encouraged or dis- 
couraged from taking the new job. The lead- 
ers were thus shown that the discussion of 
each step can proceed smoothly from one to 
the next without being an artificial mechani- 
cal procedure. 

2. The several substeps in the procedure 
were described and their purpose explained, 
so that the leaders would understand their 
purposes. There was some reason to believe 
that in the earlier study some of the develop- 
mental leaders failed to follow the procedure 
because they did not understand it. 

3. By actually conducting a developmental 
discussion through the second step, the trainer 
was able to clarify the distinction between 
listing the activities Viola was supposed to 
perform on her job and the evaluation of her 
performance. This separation of listing and 
evaluation is not always effected when people 
receive only the list of instructions. 

4. The demonstration also served to clearly 
distinguish between the techniques of the de- 
velopmental method and the usual free dis- 
cussion method for solving problems. Both 
the actual demonstration and the trainer’s 
comments about the procedure illustrated the 
fact that the developmental technique sys- 
tematizes the problem solving discussion with- 
out sacrificing the consideration of alternative 
solutions under the free discussion method. 


By performing these functions, the demon- 
stration thus permitted the leaders to see how 
their previous training in group decision 
methods could be applied to use the develop- 
mental technique more effectively. Many of 
the developmental leaders in the original 
study may not have been able to see this 
transfer of training and may have viewed the 
developmental technique as a new and dif- 
ferent method for group problem solving. 

The fact that the 


distribution of votes 


L. Richard Hoffman 


among Ss led by leaders who had not had 
the case previously was not very different 
from that of Ss under leaders who knew the 
case suggests that it is the developmental 
technique rather than the knowledge of the 
case which is most important in improving 
the quality of decisions. Furthermore, it is 
not clear that the somewhat higher quality 
of the decisions in the latter group was not 
due as much to the leaders’ greater familiarity 
and experience with the methods of group 
decision as it was to their knowledge of the 
preferred decision. 


SUMMARY 


A study was conducted to test the hypothe- 
sis offered by Maier and Maier (1957) that 
the use of the developmental discussion tech- 
nique by skilled leaders would increase the 
proportion of high quality decisions by re- 
versing a strong majority trend toward poor 
quality decisions in the Case of Viola Burns. 
Twenty-two students in a graduate course in 
supervisory training were given an instruction 
sheet containing the five steps in the develop- 
mental technique as adapted to this case and 
a demonstration of how to conduct the dis- 
cussion through the second step of the devel- 
opmental procedure. Each student conducted 
two developmental discussions among groups 
of three persons each, to determine whether 
Viola should be encouraged or discouraged 
from taking the new job which had been 
offered to her. 

The percentages of respondents who voted 
to “discourage Viola from taking the new 
job” in the interest of the company were 
39.7% for Ss under untrained developmental 
leaders, 61.4% for Ss under trained leaders 
without prior experience with the case, and 
74.7% of Ss under trained leaders who had 


had the case previously; the percentages of 
votes to discourage Viola in her own interest 


"oF 


were respectively 39.7%, 55.7%, and 73.4%. 
The distributions of votes on both viewpoints 
are significantly different by chi square test. 
The differences between the votes of Ss under 
trained leaders and those under untrained 
leaders indicate that the demonstration was 
sufficient to improve the skills of the leaders 
and to produce a greater proportion of high 
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quality solutions. For the Ss in groups under 
the trained leaders, the difference between the 
proportion of votes to “discourage” Viola was 
significant only from Viola’s viewpoint, but 


the significance of this difference maintained 
even when the leaders’ votes were removed 
from the analysis. Thus, greater experience 
with the methods of group decision and fa- 
miliarity with the case also contributed some- 
what to higher quality decisions. 


The results of the study provide strong 
evidence for the power of the developmental 
discussion technique for producing decisions 
of high quality. 
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In a previous study in which Mecherikoff 
and Horton (1959) attempted to find pairs of 
letters similar enough in appeal to be used for 
labeling packages in studies of brand prefer- 
ences, one of the preliminary steps was to ask 
two groups of university students to rank 


order the alphabet, one group according to 
the pleasantness of the sound of the letter, 
and the other according to the pleasantness 
of the appearance of the letter. Recently more 
data were collected using basically the same 
procedure, but with a larger and more ade- 


TABLE 1 


Total Group Subgroup M 


Mean 
Rank 


Mean 
Rank Letter Letter Rank 
B . i 8.0 
S 8 8.8 
A 8 ] 10.0 
M S 10.2 
R . { 11.4 
N 
D 
( 
O 


— 


Nm NN NW Ww 


9 
13.1 
13.4 
14.0 
14.0 
14.6 
14.9 
15.8 
15.9 
16.2 
16 
17 
17 
17 
18.: 


~ 


= & 
WN OO SO * 
NxM WHS SO 
nN 
eCNae KS < 


NNN NY NW WY 
Un me ¢ 
© 


—“ 
wnwe 
© 


= 


100 10 
14 12 23 
eo 11 21 
94 88 91 


= 


Note Ranking of the alphabet, mean rank assigned to 


na previo tu 


Subgroup / 


Mean 


Rank 


6.1 B 
6.6 
92 
9.6 
10.3 


each letter 
among all possible pairings of judges, and estimated reliability of ranking for the total sample 
f study 


Previous 
Subgroup Y Subgroup O Sample 
Mean 
Rank 


Mean 
Rank 


Mean 


Letter Rank Letter 


7.1 
8.7 
94 
99 
10.2 
11.8 
11.9 
12.1 


7.1 B 6.7 
8.8 \ 7.7 
8.9 S 
98 M 10.6 
11.3 D 11.0 
11.6 W 11.3 
11.8 O 11.7 
11.9 G 12 
12.2 12.0 & 12. 
12.9 12.2 R 12.3 
13.0 , 2.7 P 12.4 
13.2 13.0 H 12.9 
13.3 13.3 , 13.1 
13.4 : 13.3 
13.7 J 13 
13.8 rT 14.0 
15.1 . 14 


15.2 


9.6 


13.6 
13.7 
13.9 
14.8 
14.8 
15.8 
16.1 
16.4 
16.6 
17.0 


15.3 
16.0 
16.2 
17.4 
17.5 
18.0 
18.4 
18.5 


24 
13 A7 
12 13 
91 78 


mean rank order correlation 
age and sex subsamples, and a sample 


coefficient of concordance 





Letter Preferences 


quate sample. In this case we were interested 
only in rankings according to the appearance 
of the capital letter. In the previous study, 
a rank order correlation of only .50 was found 
between ranking letters by sound and ranking 
by appearance; therefore, the data to be pre- 
sented here should probably not be general- 
ized to cases where preference for the sound 
of the letter is important. 


METHOD 


Sheets of paper containing instructions for the 
task, spaces to list the alphabet in preferred order, 
and the alphabet (in alphabetical order, to be used 
as a check list to avoid duplicating letters in the 
ranking) distributed to the 
general psychology class in the summer session at 
the University of Minnesota. The Ss were asked to 
indicate their sex and whether they 2 


were members of a 


were under 25 
vears of age or older. This arbitrary age division was 
made in order to separate the more typical college 
student from the possibly atypical! older summer 
session student. The sample consisted of 19 males 

2 females 


than 25, 


25 or older, 41 males younger than 25, 5 


. 


5 or older, and 35 females 
making a total of 100 Ss. 


younger 


RESULTS 


Table 1 presents the rankings of the alpha- 
bet by the total sample, several subsamples, 
and a previous sample of 22 males in a senior 
college psychology course. (M indicates male; 
F, female; V, younger than 25 years of age; 
O, 25 years old or older.) Along with these 
rankings are shown the mean rank assigned 
to each letter (rounded to one decimal place), 
the Kendall coefficient of concordance among 
judges (W), the mean rank order correlation 
of all possible pairings of the judges (7), and 
the estimated reliability coefficient of the 
ranking (r,). The reliability coefficient, r,, is 
the expected rank order correlation between 
the given ranking and a ranking done by a 


different but comparable group of judges. It 
may be noted that while the agreement among 
judges is generally low, the reliabilities of 
the group rankings are quite high, indicating 
that there are stable preferences in the popu- 
lation as a whole, although individuals differ 
rather widely among themselves. Detailed ex- 
planation of W and # may be found in Ed- 
wards (1954) and Walker and Lev (1953), 
and of r, in Edwards (1954). 

Three rank order correlations were calcu- 
lated: (a) ranking by males vs. ranking by 
females: r = .84; (6) ranking by Subsample 
Y vs. ranking by Subsample O: r= .83; 
(c) ranking by males in present sample vs. 
ranking by males in previous study: r = .78. 
These correlations are all in the neighborhood 
of the estimated reliability coefficients. It 
seems justified, then, to pool all the subgroups 
for an overall ranking, since differences be- 
tween groups can be attributed to chance dif- 
ferences between different groups of judges. 


SUMMARY 


A sample of 100 college students ranked the 
alphabet according to their preference for the 
appearance of the capital letter. Rankings 
are presented for the total sample, and for 
subgroups based on age and sex. Coefficients 
of concordance among judges are low, but the 
rankings for the total sample and the age and 
sex subsamples appear to be quite reliable. 


REFERENCES 


Epwarps, A. L. Statistical methods for the behavioral 

New York: Rinehart, 1954 

Mecuerikorr, M., & Horton, D. L 
letters of the alphabet. J. appl 
43, 114-116 

Wacker, H. M., & Lev, J. Statistical inference 
York: Holt, 1953 


sciences 
Preferences for 
Psychol., 1959, 


New 


(Received October 14, 1959) 





Journal of Applied Psychology 
1960, Vol. 44, No. 4, 254-257 


THE DEVELOPMENT AND VALIDATION OF A TEST 
OF CREATIVITY IN ENGINEERING’ 
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The purpose of this research was to de- 
velop a test of creativity which would be suit- 
able for use in the selection and placement of 
engineering personnel in industrial organiza- 
tions. 

Creativity in engineering was defined as the 
ability to produce a number of original ideas 
when confronted with problem situations. The 
development of the test was based upon the 
following assumptions: 

1. Highly creative engineers are able to 
produce more ideas when confronted with a 
problem situation than are less creative engi- 
neers. 

2. Highly creative engineers can change 
their frame of reference, or “set,’’ easier and 
quicker than can less creative engineers. 

3. Highly creative engineers are more able 
to produce uncommon ideas when confronted 
with a problem situation than are less creative 
engineers. 

4. Highly creative engineers are better able 
to visualize in space than are less creative 
engineers. 


Trest DEVELOPMENT 


Test items of three types were constructed. An 
example of each type is shown in Fig. 1. Six experi- 
mental forms were each made up of eight Type I 
items, four Type II items, and eight Type III items, 
in that order. Each item appeared on a separate 
page and had a 2-min. time limit 

The six experimental forms were administered 
randomly to 212 sophomore and junior engineering 
students and counting the number of 
different responses to each item. Means and standard 
deviations were computed for each item. The 60 
items having the greatest SDs at the various diffi- 
culty levels were retained and assembled into three 
comparable, revised experimental test forms 


scored by 


1 Based upon a thesis submitted to the faculty of 
Purdue University in partial fulfillment of the re- 
quirements for the PhD degree. The research was 
sponsored by the Purdue Research Foundation under 
the direction of C. H. Lawshe. 

2Now associated with Human 
Inc., Los Angeles, California 


Factors Research, 


The Creativity Scores. The three revised forms 
were administered randomly to a group of 133 senior 
mechanical engineering students. Responses of this 
group were listed for each Type I and Type II item 
in each test form and grouped into categories. In 
addition, the frequency of occurrence of each cate- 
gory of each item was determined. From these data, 
the Flexibility score and the Originality score were 
developed. 

The Flexibility score was based upon the number 
of different categories represented by an individual’s 
responses to each Type I and Type II item 

The Originality score was based upon the weight- 
ing of the different categories of the Flexibility score 
Those categories which occurred frequently 
were given the greatest weight. 

The score based upon the number of different re 
sponses by an individual to each Type III item was 
called the Fluency score. 

The Final Test Forms. The two final forms of the 
test were made up of the 40 items having highest 
indices of discrimination at the levels of 
difficulty. Each index of discrimination was based 
upon the internal consistency criterion of the appro- 
priate Creativity Item types were arranged 
in order of increasing difficulty and so as to provide 
comparability of forms 


least 


various 


score. 


INDUSTRIAL VALIDATION 


Investigations of the validity, reliability, 
scorer agreement, relationships with other 
tests, and “face validity” of the Creativity 
scores were made with two groups of engi- 
neers at a large automotive accessories manu- 
facturing company.* The two test forms were 
considered comparable and used interchange- 
ably in these investigations. 

Validity with Group I. The 33 engineers in 
this group were concerned with the develop- 
ment and improvement of automotive prod- 
ucts such as fuel pumps, instruments and 
other accessory 
equipment. The ability to produce a number 


gauges, air cleaners, and 
of original ideas was considered by their su- 


pervisors to be of considerable importance in 
8 The author is indebted to R. H. Harris, A. L 


Simberg, and R. E. Chandler, of the General Motors 
Corporation, for their cooperation in this study 
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the work of these engineers. In addition, the 
supervisors felt that they worked closely 
enough with their men to be able to evaluate 
them on this ability. 

Three supervisors, each of whom had about 
10 to 15 men in his group, were asked to 
evaluate their men on the ability to produce 
a number of original ideas when confronted 
by problem situations. A modified pair com- 
parison system of ranking was used in which 
each man was compared with every other 
man, but in which the order of comparison 
was not controlled. Both written and oral in- 
structions were given for the rankings and 
were directed at eliminating contamination of 
the rankings by halo effect. Engineers in the 
high half of each of the three rankings col- 
lectively comprised the high creativity group 
(V = 16) and men in the low half of each 
of the three rankings collectively comprised 
the low creativity group (NV = 17). The high 


and low groups were found to be equivalent 
with respect to age, education, and years of 
experience as engineers. 


TYPE I 


LIST AS MANY POSSIBLE USES AS 


YOU CAN FOR 


ae 
THIS OBJECT - ( q 
— 


\>—> 


TYPE 0 


LIST AS MANY POSSIBLE USES AS 
YOU CAN FOR THESE TWO OBJECTS 
WHEN THEY ARE USED TOGETHER 


paltry A 


TYPE Il 


WHAT IS THIS ? LIST AS MANY 


MAS. 


Fic. 1. An example of each item type. 


POSSIBILITIES AS 
YOU CAN 


TABLE 1 


VALIDITY COEFFICIENTS FOR Group I (N = 33) 
AND Group II (N = 29) 


Group I Group II 


Fluency score 10 .39* 
Flexibility score A7* 31* 


Originality score Fag 18 


* Significant at the .05 level. 
** Significant at the .01 level. 


Fluency, Flexibility, and Originality scores 
were obtained for the engineers in this study. 
The biserial correlations between each of the 
Creativity scores and the criterion dichotomy 
are presented in Table 1. These coefficients 
are of sufficient magnitude to indicate signifi- 
cant concurrent validity. 

Validity with Group II. The 29 engineers 
in this group were concerned with the devel- 
opment of procedures and machines for the 
manufacture of products. The ability to pro- 
duce new ideas for the solution of problems 
was, almost by definition, important in this 
work. 

The pair comparison procedure and rating 
instructions used with Group I were also 
used by the supervisor of this group to rank 
the 29 engineers with respect to creativity. 
The rankings correlated .31 with years of ex- 
perience in engineering, but were uncorrelated 
with age or years of formal education. The 
significant correlation with experience may 
have been evidence of halo operating and may 
have reduced the Creativity score validities, 
since the scores were not positively related 
to experience. 

The ranks derived from the pair compari- 
son ratings were transformed to standard 
scores and product-moment correlations with 
Creativity scores were obtained. These corre- 
lations are presented in Table 1. The size of 
these coefficients indicates that the test has 
significant concurrent validity. 

Reliability. Two aspects of the reliability 
of the Creativity scores were investigated. 
The degree to which the scores consistently 
measured the abilities defined by the test at 
a point in time was determined by computing 
coefficients of internal consistency. These re- 
liabilities were estimated through Spearman- 
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TABLE 2 


RELIABILITY AND SCORER AGREEMENT OF THE 
CREATIVITY SCORES (N = 64) 


Scorer 
Reliability Agreement 
Fluency score .93 1.00 
Flexibility score 86 87 


Originality score 80 87 


Brown extensions of split-half correlations. 
The reliability estimates based upon scores 
of 64 professional engineers are presented in 
Table 2. 

A second aspect of reliability was the agree- 
ment between scorers of the test. Scores for 
the 64 engineers were obtained independently 
by two scorers, one of whom was not associ- 
ated with the research project. Product- 
moment correlations between the two sets of 
scores are also presented in Table 2. 

Relationships between Creativity Scores 
and Other Tests. Fluency, Flexibility, and 
Originality scores were correlated with the 
Wonderlic Personnel Test, a test of mental 
ability (Wonderlic, 1942), the Mechanical 
Comprehension Test Form CC (Owens & 
Bennett, 1949), and the AC Test of Creative 
Ability (Harris, 1955). These product- 
moment correlations are presented in Table 3. 

The Creativity scores appeared to be quite 
independent of the Wonderlic test of mental 
alertness and the Mechanical Comprehension 
Test Form CC. This was due, in part at least, 
to the restriction in the range of the scores 
on these two tests. The purpose, however, in 
investigating these relationships was to ascer- 
tain that the common variance between the 
Creativity scores and the criterion could not 
as well be accounted for by these other two 
predictors. 

The Creativity scores were all significantly 
related to the AC Test of Creative Ability, 
although, for the most part, these correlations 


were not high. A significant relationship was 


expected here because the two tests had ex- 
hibited validity in similar situations. 

Face Validity. For a test to be used suc- 
cessfully in the selection and placement of 
professional engineers, it should have at least 


Harris 


some amount of acceptance by the persons 
being tested. To obtain a measure of “face 
validity” of the Creativity Test, each engineer 
tested was asked the question, “Do you think 
that the test you have just taken can measure 
creativity in engineering?” This question was 
part of a confidential general information 
questionnaire that each engineer was asked 
to complete after he had taken the test. 

The degree of acceptance of the test was 
indicated by the proportion of favorable re- 
sponses to this question. Of the 64 engineers, 
42 responded favorably, 18 responded nega- 
tively, and 4 gave neutral responses. 

Correlations between Creativity Scores. 
Product-moment correlations were computed 
between Fluency, Flexibility, and Originality 
scores for the total sample of 64 engineers. 
These correlations were: Fluency vs. Flexi- 
bility .56, Fluency vs. Originality .49, and 
Flexibility vs. Originality .95. 

Even though the Originality score had ex- 
hibited a degree of unique variance in the 
individual validation studies, the high corre- 
lation between Flexibility and Originality 
scores for the total sample indicated little of 
value in the extra effort required to obtain 
the Originality scores. 


SUMMARY AND CONCLUSIONS 


Two forms of a 20-item test of creativity 
were developed through analyses of item re- 
sponse data of 345 engineering students at 
Purdue University. Three scores were devel- 
oped for the test: Fluency score, Flexibility 
score, and Originality score. Investigations of 
the validity, reliability, interscorer agreement, 
relationships with other tests, and “face va- 


TABLE 3 


CORRELATIONS BETWEEN CREATIVITY SCORES 
AND OTHER TESTS 


Mech 
Wonderlic Comp. CC AC 
V = 48 V = 43 


Fluency score 08 
Flexibility score 13 
Originality score 5 08 


* Significant at the .05 level. 
** Significant at the .01 level. 
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THE VALIDITY OF THE GUILTY KNOWLEDGE TECHNIQUE: 


THE EFFECTS 


DAVID T 


University 


Contrary to what many psychologists be- 
lieve, most professional lie detector operators 
really assume that they are in the business of 
lie detection. Although various techniques are 
employed, all are predicated on the belief 
that there is a distinctive pattern of physio- 
logical response which accompanies lying and 
which can be distinguished from that which 
accompanies truth-telling. Thus, “Whatever 
the measuring instrument used, the under- 
lying psychological principle is identical, 
namely, that the tension occurring with de- 
ception is different from the tension occur- 
ring in response to the similar stimuli to 
which the subject answers truthfully” (Block, 
Salpeter, Tobach, Kubis & Welch, 1952, p. 
55). There have been many reports of validi- 
ties above .90 for conventional lie detector 
procedures in actual criminal investigations 
(e.g., Lee, 1953; Marston, 1938; Summers, 


1939). However, I can find no published 
accounts of properly conducted studies which 
corroborate such claims. Nor have experi- 
ments conducted under artificial or laboratory 
conditions produced validities nearly so high 


(e.g., Ellson, 1952). The work of Lacey 
(1950) and others, demonstrating that physio- 
logical response patterns show great variation 
from one individual to another, allows little 
credence for the notion that all persons can 
be expected to show the same characteristic 
pattern when lying and some different pat- 
tern when telling the truth. The fact that 
most lie detector enthusiasts have been spe- 
cialists in criminal investigation rather than 
in psychological measurement is perhaps suf- 
ficient to account for these optimistic claims. 

One basis for confusion in the existing lit- 
erature is the failure to distinguish between 


1 Richard Rose, George 
conducted this experiment 
2 This study reported during the author’s 
tenure as a Fellow of the Center for Advanced Study 
in the Behavioral Sciences, Stanford, 


Skaff, and Joc Vlitalo 


was 


California 
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of Minnesota 


lie detection on the one hand and guilt detec- 
tion on the other. The method of guilt de- 
tection has been described in a previous 
paper (Lykken, 1959a). 


Use of physiological measurements to detect, not 
lying, but the presence of “guilty knowledge,” re 
quires only the more reasonable assumption that a 
guilty person will show some involuntary physiologi- 
cal response (e.g., GSR) to stimuli related to remem- 
bered details of his crime. If the crime is such that 
the investigator can discover a number of factual 
details with which only the guilty person should be 
familiar, then the guilty knowledge method can be 
used. The guilty knowledge items are interspersed 
with other similar but irrelevant items in a stimulus 
list. The S is told that EZ is going to mention a num- 
ber of items and that, if he is guilty, he will recog 
nize some of these items as being related to the 
crime in question. The items may be stated in ques- 
tion form, in which case the S may or may not be 
required to answer. A guilty S, knowing which items 
are relevant and which are not, would be expected 
to respond differently to the relevant than to the 
irrelevant items. Usually, he would be expected to 
give larger responses to the relevant items, although 
it should be pointed out that any consistent differ 
ence in the responses to the two classes of stimuli is 
evidence of guilt. 


In an earlier study (Lykken, 1959a), 49 
male college students, after random assort- 
ment into four groups, were required to enact 
one, both, or neither of two mock crimes. 
All were then given a “guilty knowledge” 
test, employing the GSR, which used six 
standard questions relating to each of the two 
crimes. A simple, objective, and a priori scor- 
ing system was used to determine “guilt.” 
Forty-four or 89.8% of the Ss were assigned 
to their correct group, against a chance ex- 
pectancy of 25%. Considering the crimes 
separately, all Innocent Ss were correctly 
classified, while 44 of 
Guilty Ss gave “guilty” classifications, a total 
of 93.9% the 
chance expectancy of 50%. The present ex- 
periment was designed to test the hypothesis 


50 interrogations of 


correct classification against 
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that with a more comprehensive interrogation 
and a more subtle, although still objective, 
scoring system being employed, the guilty 
knowledge method can give nearly perfect 
validity even with sophisticated Ss who are 
motivated to attempt to subvert the test. 


METHOD 


The 20 Ss used included a number of medical 
students, several staff psychologists and psychia- 
trists, and a number of female members of the sec- 
retarial staff. Each of the Ss had been required 
earlier to fill out a questionnaire containing 25 items 
such as “What is your father’s first name?”, “What 
was the name of the street that you lived on when 
you were a child?”, “What was the name of your 
high school?”. The answers to this questionnaire 
then constituted the set of guilty knowledge items 
characteristic of the S. The questionnaire responses 
of the first five Ss, all medical students, were put 
together to make up the original interrogation list 
The first question on the list was “What was your 
mother’s first name?” and there followed a set of 
six women’s names, five of them being the names 
of the mothers of the first five Ss, and the first 
alternative being a name taken at random. The next 
question, “What was your father’s first name?” had 
these Ss’ fathers’ names, in scrambled order, as 
Alternatives 2 through 6, with a man’s name taken 
at random as the first alternative. All of the ques- 
tions followed the same pattern, with the first alter- 
This pro- 
cedure, which made it possible to ignore for scoring 
purposes the response to the first alternative, elimi- 
nated the difficulty caused by the tendency of many 
Ss to give a larger GSR, other things equal, to the 
first item in any series 

The object of the interrogation was to correctly 
identify by an objective analysis of the GSR proto- 
col, who S was, ie., 


native in each set always being random 


which set of questionnaire re- 
sponses was his. Since he “might have been” any one 
of the five persons whose questionnaire 
went into the interrogation list, the scoring method 
matched his protocol against each of these five possi- 
bilities and defined a criterion to decide which of 
the five he must be. For all Ss after these first five, 
the interrogation list was constructed merely by sub- 
stituting the 25 questionnaire responses of the new 
S in place of those belonging to one of the first 
five Ss. Thus, the last 15 Ss were all scored “against” 
the same four individuals. 

Scoring Procedure. The first step in scoring a 
record was to rank the GSRs to the five alternatives 
for each question (ignoring the first) in order of 
amplitude. Then a distribution was made of the 
ranks of the GSRs to the 25 alternatives from one 
of the five questionnaires used in the list. If the 
questionnaire did not belong to the 


responses 


S in question, 
then he should have no way of responding consist- 
> 


ently the same to these 25 items, and one would 
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expect the distribution of ranks to be roughly rec- 
tangular, i.e., about as many ranks of one, as of two, 
as of three, etc. But, if the items did come from that 
S’s questionnaire, then some peculiarity would be 
expected in the distribution. If he had not been 
successful in producing false responses, for example, 
he should have mostly ranks of one (this was the 
most common result). Or, if he had produced many 
large GSRs to the Innocent alternatives, he should 
have mostly ranks of four or five. Even if he had 
been clever enough to consistently produce large 
GSRs to just two of each four Innocent alternatives, 
the distribution of ranks would still betray him in 
showing mostly ranks of three, with few 
fives. 

The expectation was that each distribution of 25 
ranks would be rectangular, ie., five ranks of one, 
five of two, etc. Each such distribution was given 
a score by awarding a point for each entry over 
five in any of the five rank positions. Thus, a 
perfectly rectangular distribution would have a score 
of zero, indicating minimum likelihood that the S 
“belonged” to the given set of items, while a distri 
bution in which all 25 responses had the same rank 
would get a score of 20. When no measurable GSR 
occurred to a given item, no rank was assigned to 
that item and the distribution of ranks for that list 
therefore totaled less than 25. In such 
expected frequency of ranks in each position was 
figured at one-fifth of the total and the distribution 
scored in the usual way 

Conditions and Instructions. While being 
tioned, S was seated in the interrogation room with 
a blindfold over his eyes and a pair of headphones 
adjusted to his ears. E was located with the appara- 
tus in an adjoining room and spoke to S over a 
microphone. The GSR electrodes were the two- 
element lead type (Lykken, 1959b) fixed over the 
fingerprint area of the first and third fingers of the 
dominant hand. The electrolyte was Sanborn Redux, 
and the effective electrode area was a circle of 
*%-in. diameter. A constant dc current of 50 “a. was 
employed, and total skin resistance and skin resist- 
ance changes (GSRs) were written out independ- 
ently on rectilinear coordinates and at better than 
5% accuracy, using a “recti-riter’” recording milli- 
ammeter. 

Each S was given a 15-min. lecture on the nature 
of the GSR, the lie detector in general, and the 
principle of the guilty knowledge method in par- 
ticular. After being attached to the GSR electrodes, 
each S was allowed to sit before the recording in- 
strument and practice producing voluntary GSRs by 
various methods. Each S was told what the format 
of the questioning would be, was cautioned against 
attempting to defeat the test merely by inhibiting 
responses, and was advised (correctly) that the best 
way to confuse the scoring system would be to 
produce GSRs of various amplitudes to the innocent 
alternatives in as random a pattern as possible. Each 
S was then offered a prize of $10.00 if he could by 
any such means manage to defeat the 
scoring system being used 


ones or 


cases, the 


ques- 


objective 
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TABLE 1 
Scores ON EACH OF THE FIVE QUESTION Lists, 
EXPRESSED AS PERCENTAGE OF S’s 
ScorE ON His Own List 
Question Lists 
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Mean Percentage = 43.67. 
Range 18-86. 


RESULTS 


The scoring system employed here had two 
minor defects, both of which operated to work 
against the power of the test. First, since a 
number of the Ss were acquainted, there were 
some cases in which an S did recognize sev- 
eral of the innocent (for him) alternatives as 
belonging to the same individual. Secondly, 
since each S was “guilty” with respect to one 
set of 25 alternatives, his tendency to get an 
asymmetrical distribution of ranks for his 
responses to these items necessarily prevented 
the distribution of ranks for his responses to 
the other sets from being truly rectangular. 
This decreased the discriminating power of 
the scoring but, although a statistical correc- 
tion could have been made in each case, this 
was not done since the simpler method still 
gave 100°% correct identification. The results 
of the experiment, that is, were that the Ss 
were correctly matched with their own sets 
of questionnaire responses in all 20 cases, 


Lykken 


with no ambiguities and by a completely ob- 
jective a priori scoring system (see Table 1). 
Assuming a chance probability of a correct 
match being 0.20 in each case and these prob- 
abilities being independent (this scoring could 
have matched all 20 Ss with the same ques- 
tionnaire), this result is obviously significant 
(p < 10°**). 
DISCUSSION 

The guilty knowledge technique, of course, 
is not new. Every psychology student has seen 
it demonstrated using the GSR and “a num 
ber between one and five.” In one form or 
another it also appears repeatedly in the lie 
detection literature. Thus, the “peak of ten- 
sion” test as described by Keeler (1933) 
originally involved presenting to the suspect 
a list of related items of which one was a 
“significant” item and looking to the response 
record for signs of increased physiological 
“tension” up to the significant item, decreas- 
ing thereafter. When only the guilty suspect 
knows which is the significant item, this is a 
crude form of the guilty knowledge test and 
is, potentially, an objective and accurate 
method of guilt detection. But many opera- 
tors now make a practice of showing the list 
beforehand to the suspect, in order to en- 
hance his apprehension of the critical item, 
and often this item is merely a direct question 
of guilt. Thus, the “peak of tension test” has 
become essentially just another fallible lie 
detector procedure. The “indirect” or “‘asso- 
ciation” method described by Lee (1953) is 
even more similar to what is here called the 
guilty knowledge technique but is classified 
by that author as a lie detection procedure. 
A clear recognition that guilt detection is a 
different procedure, inherently much more 
dependable than lie detection, and that it is 
based on the diagnosis of guilty knowledge, 
specifically, should contribute to the develop- 
ment of instrumental interrogation in several 
ways. One result should be the adoption of a 


standard format and objective scoring system 
which would eliminate the vagaries of sub- 
jective “expert” judgment. Such a develop- 
ment, which would put the operator in a posi- 


tion analogous to the fingerprint expert, 
should increase the willingness of police de- 


tectives to make use of these facilities early in 
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the case at the time when a successful appli- 
cation of the method is most likely. 

A common attitude in the lie detection 
field (e.g., Lee, 1953) is that the GSR is not 
a useful physiological datum because, para- 
doxically, it is “too sensitive.” Again, the 
difficulty seems to stem from the use of lie 
detection rather than guilt detection methods. 
Undoubtedly cases will be found in which 
the guilty S will experience a strong enough 
reaction to a direct accusation of guilt to 
show a marked cardiac and respiratory reac- 
tion not shown to irrelevant questions. Such 
a clearly differentiated “lie response” will 
seldom be seen in a GSR record, where re- 
sponses to all questions will tend to be the 
rule. But for guilt detection, the extraordi- 
nary sensitivity of the GSR is a clear virtue, 
as is the relative simplicity of the GSR curve 
where basal level, latency, and amplitude of 
the response are easily measured and have a 
clear significance. As long as the S responds 
at all (and I have never observed a failure 
to respond when proper measuring techniques 
were used) there is no reason to suppose that 
blood pressure or pneumographic records add 
any useful information to that provided by 
the GSR, appropriately used in the guilt de- 
tection paradigm. 

The experiment reported here seems to 
testify, as conclusively as such laboratory 
studies can, that the guilty knowledge method 
can yield extremely high validities, even with 
sophisticated defensive Ss, under conditions 
appropriate to its use; i.e., when enough 
“guilty knowledge” is available to the opera- 
tor to enable him to construct an adequate 
interrogation list. Since this guilty knowledge 
material can involve completely inconsequen- 
tial matters and need not refer to the more 
dramatic and publicized aspects of the crime 
in question (e.g., what the weapon was, what 
was stolen, etc.), it seem that com- 
petent investigation should be able to provide 
enough appropriate material in a large num- 


would 


ber of criminal cases, even when there has 


been considerable publicity or even prior 


questioning of suspects. Since lie detection, in 
contrast with the guilty knowledge method, 
can be used in all cases (e.g., whenever there 
is a suspect willing to be questioned), an 


appropriate comparative field study might 
show that the overall validity of the conven- 
tional method is higher. However, the advan- 
tages of having available a guilt detection 
method of nearly perfect validity where it 
can be used are obvious. 

The fact that the guilty knowledge tech- 
nique, unlike the lie detector, does not require 
that the S answer any questions or, indeed, 
say anything at all, may have helpful legal 
implications. Since the S is not required to 
speak it is clear the he is not “testifying 
against himself” except in the same trivial 
sense as when he is made to show his face 
to a witness, and unlike the blood test situa- 
tion no pain or violation of 
integrity is involved. 


his physical 


SUMMARY 


A distinction is made between instrumental 
methods of lie detection and of guilt detec- 
tion. In the absence of adequate data support- 
ing claims of high validity for lie detection 
procedures in criminal investigation, and 
since present knowledge of physiological re- 
sponse patterns argues against the assump- 
tion that all persons respond one way when 
lying and another when not, these claims are 
considered unacceptable. 

A method of guilt detection using the GSR 
is described, which involves presenting the S 
with a set of questions concerning matters 
which could be known only by a guilty indi- 
vidual. Each question is followed by four or 
five alternatives, of which one is “correct.” 
Scoring the response record for “guilt” in- 
volves identifying any pattern of nonrandom 
reactivity to the set of “guilty” alternatives. 
Each S is used as his own control and the 
scoring is entirely objective. 

An experiment is reported in which 20 
sophisticated Ss were given training in the 
theory of the GSR and of the guilty knowl- 
edge method, were allowed to practice inhibit- 
ing or producing false CSRs, and instructed 
concerning the interrogation procedure and 
scoring system to be used. These Ss were then 
offered $10.00 if they could “beat” the test. 
Correct classification was obtained in 100% 
of these cases without ambiguity, using objec- 
tive scoring of the GSR protocol alone. 
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Common observation would indicate that 
frequently encountered patterns of noise are 
better tolerated and less distracting than un- 
familiar ones. Little work has been done on 
this problem, though it has been shown that 
marked preferences for expected frequency 
ranges in musical reproduction develop 
through learning (Kirk, 1956). If this is true 
of noise as well, the fact should be taken 
into account in evaluating studies purporting 
to show relative annoyance values of different 
sound spectra of comparable over-all sound 
pressure level. For example, persons not ac- 
customed to the sound of jet aircraft noise 
may find it more annoying than that of pro- 
peller aircraft only because they are more 
accustomed to the latter. This hypothesis fits 
the principle that unfamiliar stimuli are, in 
general, more startling than are familiar 
stimuli of comparable magnitude. The annoy- 


ing property may be in large part a proprio- 
ceptive and interoceptive feedback from the 
startle itself. A few experiences of the new 
pattern, if it turns out not to precede a 
threatening situation, are usually sufficient to 
lead to a marked reduction in its disturbing 
quality. To test the hypothesis in respect to 


aircraft noise, the following 
situation was arranged. 


experimental 


PROCEDURE 


Tape recordings of fly-overs by two airplanes wer 
made. Both of the airplanes were four-engine pas- 
senger types, but one was jet-driven while the other 
was propeller-driven. The recordings were made at 
a point under the flight path 5600 meters from brake 
release. 

A room was acoustically balanced in such a way 
that the recordings, played as endless tapes, repro- 
duced with fidelity ‘the actual sounds of the two 
aircrafts in all octave bands. In addition, a spectrum 
shaper was included in the circuits to provide a 
means of compensating for slight deviations from 
the original spectrum. The play-back system was 
arranged so that the E could use the sound of either 
aircraft as a standard, varying its SPL by means of 
a decade attenuator. The other variable channel was 
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circuited through a very good flat-response speake1 
located above and in front of the subject. The over- 
all SPL was adjusted to 99 db for the jet airplane 
and 107 db for the propeller-driven airplane, in 
accordance with the actual levels of the original 
recordings. The S’s attenuator controlled the variable 
sound in steps of 2 db to plus or minus 40 db from 
the original setting. The SPL was checked at the 
beginning of each experimental session 

Ss were undergraduate students in summer school 
at the University of Washington. All Ss were paid 
at the rate of $1.00 for a single session (two trials) 
or $3.00 for three sessions with a bonus of $2.00 for 
completing the series of trials. They were not told 
the purpose of the experiment, nor were the air 
planes identified in any way. In addition, they were 
put through several other tests and procedures de- 
signed to mislead the Ss as to the aim of the study 
Informal conversation with each S at the conclusion 
of the study indicated that none of them suspected 
the hypothesis being tested 

The basic procedure was patterned after that used 
by Kryter ' who was interested in comparing annoy- 
ance levels of aircraft noise, though not habituation 
effects. An S was brought into the experimental 
room, seated at the table with an attenuator knob 
before him, and given the following instructions: 


The purpose of these tests is to determine the 
relative acceptability of noises from different types 
of aircraft. The tests are part of a program of 
research designed to obtain information that will 
be of aid in the design of 
airports 


military and civilian 

You will first hear the noise an aircraft makes 
passing overhead. Then after a time, you 
will hear the noise another type of aircraft makes 
passing overhead. We will call the first noise our 
“standard” noise and the second noise our 
parison” noise. The duration of the comparison 
(No. 2) may be the same as or shorter or 
longer than the standard noise (No. 1). You can- 
not change the duration of either noise but you 
can change the over-all level of the comparison 
noise by turning the knob on the attenuator that 
is in front of you. Your job is to listen first to 
the standard noise, then listen to the comparison 
noise, and then to adjust the intensity of the com- 
parison noise until it sounds as acceptable to you 
as the standard. By equally acceptable we mean 
that you would just as soon have the standard as 
the comparison noise in your home periodically 
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20 to 30 times during the day and night; in other 
words we mean by equally acceptable that the 
comparison noise would be no more and no le 
disturbing to you in your home than the 
noise. 

To repeat. First you will hear the standard 
noise, then, a few seconds later, the noise which 
you are able to change by means of the knob in 
front of you. The idea is to adjust the second 
noise so that it is just as annoying as the first 
one—no more, no less. Are there any 


standard 


questions ? 


The S was then presented with the standard noise, 
a short pause of standard length, and then the 
variable noise. Each time, just before the standard 
noise, the E said, “This is the standard noise.” Be- 
fore the second noise he said, “This is the compari- 
son noise; the one you are able to adjust.” 

Pre-experimental Ss usually had made a 
which satisfied them after five pairs of fly-overs 
For this reason, a minimum of five pairs was used 
for each judgment, but, if necessary, additional pairs 
were presented until the Ss stopped making adjust- 
ments of the attenuator. Ss were then asked if they 
felt the sounds were about equally annoying. When 
they indicated they were satisfied, the fly-overs were 
stopped. About 15% 
comparisons 
pairings. 

The S’s attenuator was covered so that he could 
not see the initial setting, nor any of his changes 
Neither was he told the amount of the attenuator’s 
effect. When the S indicated that he had fulfilled the 
instructions he was led out of the room, his attenua- 
tor setting was recorded, and the 


setting 


of the Ss used more than five 
None asked for more than three extra 


attenuator was 
reset. 

For the next 20 minutes the S was given a difficult 
task in reading aircraft instruments from projected 
slides. During this task he was placed in any one of 
three noise conditions: (a) silence, (b) the random 
ringing of a loud bell, (c) continuous playing of the 
recorded fly-overs of the jet noise at 99 db 
the three weekly sessions this task wa 
the S in just the same way 

After the instrument reading task, the S returned 
to the original experimental situation and repeated 
the aircraft noise adjustments as he had done before 


During 
repeated for 


EXPERIMENT [| 


Thirty-two Ss were divided into two ran- 
dom groups of 16 Ss each. For the members 
of one of these groups the noise of the pro- 
invariable 
standard noise, but Ss could control the noise 


peller airplane was always the 


of the jet airplane. For the members of the 
other group the reverse was true. 

Four of the 32 Ss failed to complete the 
series of six trials for reasons that were inde- 


pendent of the experiment (illness in family, 
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left school, etc.). Twenty-eight who appeared 
for two trials at 
over a period of three weeks constitute the 
experimental group of Experiment I. 


each of the three sessions 


Results 


The figures in Table 1 show, in decibels, 
the “penalty” imposed upon the jet airplane 
for each trial. The number given is the mean 
difference in SPL between the two noises 
when equated by Ss for “annoyance.” A ¢ test 
of the significance of the difference between 
the mean penalty of Trial 1 and the mean 
penalty of Trial 6 shows the total decrease 
of 5.2 db to be significant well beyond the 
1% level. A ¢ test made between the mean of 
the pooled penalty scores of the first session 
(Trials 1 and 2) and the mean of the pooled 
penalty scores of the third session (Trials 5 
and 6) shows this difference to be significant 
at the 1% level also, as is the difference be- 
tween the first and second sessions. The dif- 
ference between the second and third sessions 
is not significant. Despite the significant trend 
of the entire group, six Ss show an increase 
in penalty between the first and last trials 
though this was generally very slight. On the 
other hand, two Ss, starting on the first trial 
with penalties respectively of 11 and 13 db, 
ended up with minus penalties, preferring the 
jet noise at a given SPL. Both of these Ss 
showed a steady drop in penalty throughout 
the three-week period. As mentioned earlier, 
the standard for one group was the noise 
of the propeller airplane, for the other group, 
the jet airplane. It was thus possible to com- 
pare change in penalty assessed by each of 
the two groups separately. An inspection of 
the two sets of data shows that the decrease 
was found in both groups, but that the group 
using the jet noise as standard was respon- 
sible for most of the change. The difference 
between the groups is not a significant one, 
however. 

As a further control for Exp. I, 20 “‘new” 
Ss were interspersed among the “old” Ss 
during the last week, that is, 
the old Ss. The mean penalty for the new Ss 
was 12 db on both the first and second trials 
—significantly different from the 8.6 and 8.9 
db for the habituated ones. 


Session 3 for 





Acoustical Energy Spectrum 


EXPERIMENT II 


This experiment was designed to determine 
whether Ss would show increased amounts of 
habituation if they were to be busily engaged 
in activity possibly inhibitive of specific 
annoyance responses while a comparatively 
large number of fly-overs took place. An ex- 
perimental group and two control groups were 
used. Ss in all groups were treated identically 
as described in Exp. I except for the activity 
during the interval between the two trials of 
each session. All persons in all three groups 
were given a difficult task in the reading of 
airplane instruments, as in Exp. I, but the 
experimental group was simultaneously sub- 
jected to fly-overs of the jet airplane while 
members of the other two groups were sub- 
jected respectively to silence and to the sound 
of a loud bell at aperiodic intervals. 


Results 


The changes in penalty during successive 
trials for each group are shown in Table 2. 
Differences between groups for Trial 1 must 
be assigned to nonexperimental factors, since 
all Ss were treated alike and the differences 
are not, in any case, significant. All groups 
showed significantly decreased penalties from 
first trial to last, confirming the findings in 
Exp. I. 

All of the differences between the group 
exposed to fly-overs and the other two groups 
are in the direction that would be expected 
from the hypothesis of decrease in annoyance 
as the result of habituation. The fly-over 


TABLE 1 


DIFFERENCE IN DB BETWEEN JET AND PROPELLEI 


NorsE AT EQUAL-ANNOYANCE LEVEI 


(db re dynes per cm? 


W eek 


i.€., Session 
First 
First 
Sec ond. 
Second 
Third 
Third 


TABLE 2 


PENALTY CHANGES FOR AN EXPERIMENTAL Group 
EXPOsED TO INTERTRIAL FLy-OvERS Com 
PARED TO CONTROL GROUPS 


(db re dynes per cm?) 


Group J 
(Jet Group S_ Group B 
Fly-overs) (Silence) (Bell) 
N=13 N=12 WN #=13 
Penalty Penalty Penalty 
Week in in in 
(i.e., Session Decibels Decibels Decibels 
First 13.6 11 
First 14.1 11 


Second : 
Second f 12.1 


Third 
Third 


group shows the greatest drop from first trial 
to second trial, from first week to second 
week, and from first trial to last trial. Each 
of these differences barely escapes statistical 
significance however, possibly because of rela- 
tively large S variability within groups. An- 
other possibility is that habituation occurs 
rather rapidly, and that the number of fly- 
overs during the first session (Trials 1 and 2) 
is enough in each of the control groups to 
bring the penalty down to a level below 
which the additional fly-overs in the experi- 
mental group would not 
measurable effect. 


have an easily 


DISCUSSION 


All procedures used to examine the hy- 
pothesis that a decreasing penalty against the 
jet noise would occur as a result of repeated 
exposure showed a significant trend in the 
hypothesized direction. It may be reasonably 
held, however, that the findings are consist- 
ent with the possibility that the propeller 
noise was becoming more annoying, that both 
noises were becoming less (or more) annoy- 
ing but in different degrees, or that the pro- 
peller noise became more annoying while the 
jet noise was becoming less annoying. The 
procedure provides no formal check of these 
possibilities, nor does any feasible method 
suggest itself. There are nevertheless several 
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bits of evidence indicating that ‘absolute’ 
annoyance levels did not arise. After finishing 
the experiment, Ss were asked to discuss the 
whole experience as long as they could be 
persuaded to do so. None remarked that he 
felt successive sessions more trying. On the 
contrary most of them found the situation to 
be far less disturbing on the last trial than 
on the first, though this feeling could have 
been related in part to general familiarity 
with the experimental situation. A few volun- 
teered the opinion that the noises were louder 
in the first session than subsequently. Strictly 
speaking, however, the experiments showed 
only a decreasing difference in annoyance 
between the noises. 


SUMMARY 


The possibility that airplane jet-engine 


noise of a given SPL would become relatively 
less annoying to Ss after repeated exposure 
was tested in a situation in which the (taped) 


S. Culbert and M. I. Posner 


noise was compared to that of a propeller- 
driven airplane. A group of 28 Ss showed a 
significant increase in tolerance for the jet- 
engine noise (in comparison to propeller 
noise) after two series of exposure trials a 
week for three consecutive weeks. The toler- 
ance for the habituated group at the end of 
three weeks was also significantly greater than 
that shown by 20 control Ss tested then for 
the first time. A test using additional Ss in 
another experiment corroborated the results 
of the first test, but no significant differences 
were found between those Ss exposed to inter- 
trial fly-overs while reading airplane instru- 
ments and those assigned to the same inter- 
trial task while exposed to a loud bell or 
silence. 
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THE RELATION BETWEEN SPEAKING TIMES AND 
DECISION IN THE EMPLOYMENT INTERVIEW ' 


C. W. ANDERSON 


McGill University 


The problem of how an interviewer’s deci- 
sion to accept or reject an applicant is related 
to events in the interview has received little 
attention in most studies of the employment 
interview. Wagner (1949) has reviewed a 
number of’ investigations of the interview in 
which the emphasis has been upon the relia- 
bility and validity of the resulting judgment 
or decision. In studies of the interview re- 
ported by Saslow, Matarazzo, Phillips, and 
Matarazzo (1957), and Goldman-Eisler 
(1954), patterns of interaction have been 
examined without direct consideration of 
their relation to the decision of the inter- 
viewer. Daniels and Otis (1950) have 
examined the interrelation of events in the 
employment interview without reference to 
the decision. 

The present study was undertaken to in- 
vestigate the possibility that speaking times 
are one class of the events in the interview 
that are related to the interviewer’s decision. 


PROCEDURE 


The sample consisted of 115 disk recordings of 
employment interviews conducted by six Army per- 
sonnel officers.2 The applicant was accepted by the 
interviewer in 70 cases and rejected in 45 cases. It 
should be noted that the Army in Canada is a small 
professional body maintained by voluntary enlist 
ment. Selection procedures in the Canadian Army 
appear to be at least as 
industry. 


rigorous as those in 

Two stop watches were used to measure the time 
the applicant spoke and the time the interviewer 
spoke in each interview. The total time was meas- 
ured by starting a third watch at the beginning of 
the interview and stopping it at the end. The vacant 
time ‘was determined by subtracting the speaking 
time from the total time. Readings of the 
were taken to the nearest minute 


times 


1 This study was supported by a grant from De- 
fence Research Board, National 
Defence, Ottawa. 

2 These interview recordings were collected by 
D. Sydiaha, now Assistant Professor of Psychology, 
University of Saskatchewan 


Department of 


RESULTS 


The correlations of the applicant time, the 
interviewer time, and the vacant time with 
the total time were .721, .674, and .789, re- 
spectively. The effects of these correlations 
were removed from the intercorrelations of 
the applicant time, the interviewer time, and 
the vacant time. Table 1 shows the resulting 
negative partial correlations which are all 
significant at the .01 level. Both the applicant 
time and the vacant time varied inversely 
with the interviewer time. The negative cor- 
relation between the interviewer time and the 
vacant time is larger than that between the 
applicant time and the vacant time. The dif- 
ference between these correlations is signifi- 
cant at the .05 level. 

Table 2 shows the averages and standard 
deviations of the speaking times in interviews 
with applicants who were accepted and in 
interviews with those who were rejected. The 
interviewer time was greater and the vacant 
time was less in interviews with applicants 
who were accepted than in interviews with 
those who were rejected. These differences are 
significant at the .01 level. Both the appli- 
cant time and the total 
interviews with applicants who were accepted 


time were less in 


than in interviews with those who were re- 


jected. However these differences are not 


significant. 


TABLE 1 


PARTIAL INTERCORRELATIONS OF INTERVIEWER 
Time, APPLICANT TIME, AND VACANT TIM} 


(Errects oF Tota TimE REMOVED 


Vacant 
Time 


Interviewer 

Time 
Applicant Time 651 — .298 
Interviewer Time 531 


Note ’ 


= —,254 is significant at the .01 level; the difference 
betweenr = - 


298 andr = —.531 is significant at the .05 level 
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TABLE 2 


MEANS AND SDs In INTERVIEWS WITH 


APPLICANTS WHO WERE 


ACCEPTED AND 


APPLICANTS WHO WERE REJECTED 


Interviews 

- Applicant 

Interviewer’s 
Decision 


Number of 


Interviews Mean SD 


70 
45 


7.81 
8.44 


4.48 
4.39 


Acceptance 
Rejection 


Note. 
.01 level. 


The differences in Interviewer time 


DISCUSSION 


The partial correlations in Table 1 are 
consistent with the idea that the interviewer 
influences both the amount the applicant 
talks and the amount of time in the inter- 
view during which no speech occurs. 

It appears that the amount of time in the 
interview that is free of speech is determined 
to a greater extent by the interviewer than 
by the applicant. 

The average speaking times in Table 2 
indicate that: 


1. The talks more in_ inter- 
views with applicants he accepts than 
in interviews with those he rejects. 
The amount of time free of speech is 
less in interviews with applicants who 
are accepted than in interviews with 
those who are rejected. 

The amount the applicant talks is about 
the same in the acceptance cases as in 
the rejection cases. 

The length of the interview is about the 
same in the acceptance cases as in the 
rejection cases. 


interviewer 


The decision to accept or reject the appli- 
cant of course, the interviewer’s. It has 
been suggested that the amount of interview 
time free of speech is chiefly determined by 
the amount the interviewer talks. Accordingly 
the relation between the amount of time free 
of speech in the interview and the decision 
may be a consequence of the relation between 
the interviewer speaking time and the deci 
sion. 

This interpretation leaves, as the central 
problem raised by these results, the question 
of why the amount the interviewer talks is 


is, 


Interviewer 
Mean 


9. 


6 


and in Vacant tit 


Time in Minutes 


Vacant Total 


SD Mean Mean 


27 


42 


7.87 


10.49 


13 
25 


1e between Acceptance and Rejection cases are significant < 
directly related to his decision to accept an 
applicant. 

SUMMARY 


A sample of 115 employment interviews 
conducted by six Army personnel officers was 
examined. The applicant was accepted by the 
interviewer in 70 cases and rejected in 45 
cases. From a recording of each interview 
measures were taken of the time the appli- 
cant spoke, the time the interviewer spoke, 
and the total time. The time vacant of speech 
was determined by subtracting the speaking 
times from the total time of the interview. 

The applicant speaking time and _ the 
vacant time varied inversely with the inter- 
viewer speaking time. The interviewer time 
was greater and the vacant time was less in 
interviews with applicants who were accepted. 

It is suggested that the interviewer influ- 
ences both the amount the applicant talks 
and the amount time free of speech 
that accumulates during the interview. The 
amount the interviewer talks appears to be 
directly related to his decision to accept an 
applicant. 


of 
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ACCURACY OF EMPLOYEE REPORTS ON 
CHANGES IN PAY 


EINAR HARDIN anp GERALD L 


HERSHEY ! 


Labor and Industrial Relations Center, Michigan State University 


In research on social aspects of techno- 
logical or organizational change it is fre- 
quently desirable to obtain information on 
the extent and direction of change occurring 
in various aspects of the work situation of 
employees. Several potential techniques exist 
for securing the desired information, and a 
choice must be made among them. While not 
devoid of usefulness, the records of an or- 
ganization frequently lack sufficient detail or 
coverage to be helpful. Cross-sectional ques- 
tionnaire surveys, which ask questions about 
the status quo prevailing after the primary 
(technological or organizational) change, are 
frequently limited by the lack of really com- 
parable control groups. Longitudinal ques- 
tionnaire surveys, conducted before and after 
the primary change, may be a preferred tool, 
but several factors limit their use. These 
factors include inability to gain access to the 


research site and to select all the relevant 


questions before the primary change, prob- 


lems of maintaining rapport in successive 
surveys, Other possibilities of one survey af- 
fecting responses to the next, and the waste 
that might be involved in making a “before” 
survey in anticipation of a contemplated 
change that is not brought about or that 
occurs with great delay. The weaknesses of 
both cross-sectional and longitudinal surveys 
may be alleviated, however, by use of ques- 
tions about perceived change, asked in sur- 
veys after the primary change. Some use has 
been made of perceived change questions by 
personnel of the Labor and Industrial Rela- 
tions Center at Michigan State University, by 
the Survey Research Center at the Univer- 
sity of Michigan, and undoubtedly by others 
engaged in social research. 

The usefulness of the perceived change 
technique depends significantly on the va- 


part of the automation research 
program of the Labor and Industrial Relations 
Center. Helpful comments on the manuscript were 
made by Jack Stieber and Eugene H. Jacobson 


1The study is 


lidity of the perceived change measures. In 
the seemingly sole available report on the 
validity of perceived change, Baumgartel 
(1954) stated two findings: (a) change in 
supervisory behavior was perceived more 
often in an experimental group of employees 
where there had been an intensive super- 
visory training program than in a control 
group not subjected to the training program, 
and (6) there was a significant association 
between attitudes toward supervisory behav- 
ior and perceptions of change in correspond- 
ing behavior aspects. The former finding was 
interpreted as a demonstration that perceived 
change measures had some degree of validity 
as measures of actual change. However, the 
report left some doubt as to whether the per- 
ceptions of change had their correlates in 
actual changes in supervisory behavior or 
were perhaps induced by the very knowledge 
that the supervisors in the experimental group 
had gone through a training program. The 
latter finding was taken to mean that atti- 
tudes influenced the perception of change, but 
the data presented also seemed consistent 
with the hypothesis that perception of change 
affected the attitudes. 

Like the Baumgartel report, this paper is 
concerned with the relationship between the 
change actually occurring in the work situa- 
tion of an employee and the change which 
the employee, when asked, reports and pre- 
sumably perceives in the same work situation. 
Data are presented on the degree of accuracy 
of employee reports, on the nature of the 
deviations between reports and actuality, and 
on the relationship of the deviations to poten- 
tial explanatory variables. The study is con- 
fined to one aspect of change, that of the 
presence or absence of change in pay received 
on the job. This restriction is an advantage 
in the sense that the pay actually received 
is a matter of record with very few, if any, 
accounting errors, and that actual change can 
therefore be determined almost unambigu- 


269 





270 Einar Hardin and 
ously. A possible disadvantage is that the 
findings may lack application to change as- 
pects that are less susceptible of accurate 
measurement and the perception of which 
the individual can verify less easily. 

A limited literature exists on the validity 
of reports on pay received as of a given date. 
Myers and Maclaurin (1943, p. 87) found 
that factory workers wh... interviewed tended 
to overestimate their earnings on previous 
jobs. Keating, Paterson, and Stone (1950) 
reported that interview reports of unemployed 
workers about wages on their last jobs corre- 
lated very highly with wages according to 
employer records without tendency toward 
overstatement or understatement. The dis- 
crepancy between the Keating, Paterson, and 
Stone finding and the Myers and Maclaurin 
finding as to overstatement may justify addi- 
tional study of the validity of perception of 
pay as of a given time. Such work need not 
be helpful, however, in appraising the validity 
of perceived change, for perception of change 
might conceivably be governed by other fac- 
tors than is perception of the status quo. The 
present study supplies additional information 
on the validity of perception of pay as of a 


given date and also presents evidence on the 
relationship of the validity of perception of 
change to that of the status quo. 


DATA 


The data of this study consist in the responses of 
employees to two questionnaire surveys undertaken 
in an insurance company that installed an electronic 
computer for data processing purposes, and in per- 
sonnel information on weekly salary 


records rate 
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for each employee. The salary data were collected for 
each of the two survey dates (November 19, 1957 
and May 20, 1958) and for a date six months prior 
to the first survey. 

The first questionnaire was administered about one 
month before the physical installation of the elec- 
tronic computer, and the other survey was made 
when affected employees had one to four months’ 
experience with automated procedures. The question- 
naires were designed to study employee responses to 
the technological change represented by the com- 
puter installation. Included were items dealing with 
perception of change in the industry, in the com- 
pany, and on the job, with employee likes and dis- 
likes relative to these changes (or lack of change), 
with expectations and experiences relative to the 
computer, with general and specific job satisfaction, 
with a series of personality and attitude variables, 
and with personal data. Both questionnaires used a 
detachable name coupon which the _ respondents 
completed and deposited in a separate “ballot box.” 
The employees received, completed, and returned 
the questionnaires on company time and premises 
under exclusive supervision by Labor and Industria! 
Relations Center personnel. Prior letters from the 
company president and the Center, questionnaire 
sheets, and oral presentation at the time 
administration informed the employees about the 
general nature of the research and gave them the 
customary guarantees of protection of identity. A 
total of 581 questionnaires were distributed, two 
failed to return, and one returned too incomplete for 
use. Some coupons lacked names, but all were later 
identified use of personnel In cx 
quence, the two surveys yielded 283 and 295 usable 
questionnaires, 246 of which came from persons 
participating in both surveys. Most of those partici- 
pating in only one survey were not employed in the 
company at the time of the other survey. Vacation, 
illness, company business, and one re 
fusal accounted for the nonparticipation of remain 
ing eligible Employees in 
building maintenance or part-time jobs, who spent 
most of their time from the office, or who 


lace of 


by records. 


ns¢ 


absence on 


employees who were 


away 


TABLE 1 


PERCEIVED CHANGES IN SALARY OVER TWO SUCCESSIVI 


S1x-Montu PEeriops 


(PERCENTAGE FREQUENCY DISTRIBUTIONS) 


Much 
More 
Now 
First survey 
Employed six months earlier 
Not employed six months earlier 
Second survey 
Employed at first survey date 


Not employed at first survey date 


* Less than 0.5%. 


Much 
Less 


Now 


No 


Change 


No 


Answer 





Accuracy of Employee Reports 


TABLE 2 
ABSOLUTE CHANGES IN ACTUAL WEEKLY SALARY OVER Two SUCCESSIVE Srx-MonTH PERIODS 
(PERCENTAGE FREQUENCY DISTRIBUTIONS) 


None 


First-survey participants 41 
Second-survey participants 42 


officers were excluded from the sur- 
veys. The rates of nonresponse to items in the 578 
questionnaires were very low except for questions 
not relevant to certain employee groups, questions 
dealing with expectations or of a technical nature, 
and two questions about social class 

A check list dealing with perceived change in each 
of 14 job aspects was used in identical form in both 
surveys. The last of these was the pay 
scribed as “The amount of pay I get on my job.” 
Conceptually, each of the job aspects had a quan- 
tity dimension (amount of variety, amount of super- 
vision, amount of skill needed, amount of pay, 
etc.). The respondent was instructed to answer the 
question “How has this aspect of your job changed 
in the past six months?” by checking one of the 
“much more now,” “more now,” “no 
“less now,” and “much less now.” Those 
who had held other jobs in the company six months 
before the survey in question were instructed to 
make comparisons between their jobs at the survey 
date and their previous jobs as of six months before 
the survey date. Those hired less than six months 
before the survey were asked to indicate what 
changes had occurred since they came to the com- 
pany. 

The frequency distributions of 
question about amount of pay are, show n in Table 
1. All employees in the were salaried, and 
there were no overtime, bonus, or commission pay 
ments. All salary changes were the results of discrete 
acts by the employer and could therefore be identi- 
fied readily. The absolute changes in actual weekly 
salary from six months before the 
date are shown in Table 2; 
increases. 

In the first asked, 
“What is your present salary before taxes and other 
deductions ? dollars. Check whether this is 
per month week or 2 weeks - 
The salaries reported for monthly or two-week peri- 
ods were later converted to weekly rates 

Both surveys contained a check list covering em- 
ployee satisfaction with each of the 14 job aspects, 
including amount of pay received. The aspects were 
described and ordered identically with the 
in the perceived change check list. The response 
categories supplied were “completely satisfied,” “very 
satisfied,” “quite satisfied,’ “somewhat satisfied,” 
and “not satisfied.” Separate questions were asked 


were company 


aspect, de 


L 
responses 
change,” 


Tesponses to the 


survey 


survey to the 


survey they were all 


survey the respondents were 





aspects 


Increase in Dollars per Week 


5 or Not 

More Available N 
20 i 11 246 
16 8 14 263 


about job aspects that were not readily quantifiable. 
Relevant to this paper are the questions, “How do 
you feel about the relationship between you and 
your supervisor ?”, “How satisfied are you with the 
company you work for?”, and “Taking everything 
into account, how satisfied are you with your job.” 

Both surveys contained the questions, “Could your 
household live adequately if you were not work- 
ing?”, “Is your household living adequately now?” 
“Are you the only wage earner in your household?”, 
and “Are you the main wage earner in your house- 
hold?” Response categories were “Yes” and “No.” 
In addition there were customary questions about 
age, sex, and education 


FINDINGS 
Degree of Accuracy 


The relationships between actual and re- 
ported change in pay over the two 6-month 
periods are shown in Table 3. Only those 
respondents are included under “First sur- 
vey” who were employed six months prior to 


TABLE 3 
RELATIONSHIP BETWEEN ACTUAI 
CHANGE IN PAY OVER TWO SUCCESSIVE 


AND REPORTED 


Srx-MontTH PEerRiops (PERCENTAGE 
FREQUENCY DISTRIBUTIONS 


Second 
Survey 


First 
Classification of Respondent* Survey 
Actual and reported change 3 39 
Actual and reported no-change 3. 36 
Actual change, reported 
no-change . 19 
D. Actual no-change, reported 
change 


Total number of respondents in 


\-D 241 260 


* The responses ‘“‘much more now," “‘more now," “less now,"" 
and “much less now” were counted as “reported change,” 
while the response ‘“‘no change’’ was counted as “‘reported no 
change.”” Any change, no matter how small, in actual salary 
was counted as “actual change.”’ 





Einar Hardin and 


TABLE 4 
INTERPERIOD RELATIONSHIP IN ACCURACY OF 
EMPLOYEE REPORTS ON PAY CHANGES 
(DISTRIBUTION OF RESPONDENTS) 


Second Period 


Row 


B ; D Totals 


30 8 71 
33.9) (12.0) 


i4 
(17.8) 


10 9 . 45 
(12.0 (6.3) (3.9) (40.0) 


First Period 


12 0 20 
7.4 


D 4) : 3.9) (1.0) (17.0 


Column 81 81 3: 14 211 
Totals (76.0) (78.0) (17.0) 


the survey and for whom actual salary data 
for both dates and perceived change responses 
were available. The column labeled “Second 
survey” covers respondents according to the 
same rule but with a six-month time shift. 
The observations in the two parts of the table 
therefore describe largely the same _indi- 
viduals at different times and under partly 
different circumstances. 

Categories A and B in Table 3 represent 
correct reports, while Categories C and D 
represent “failure to report” actual change 
that did occur and “false reporting” of change 
where none occurred. Correct responses were 
given by 68% of the respondents in the first 
and of the respondents in the second 
survey, the difference being statistically non- 
significant. The association between actuality 
and reporting of change fell far short of per- 
fection but was statistically significant in 
both surveys (X* = 34.9 and 67.2, df = 1). 
The data on reported change were conse- 
quently valid, in some degree, as measures of 
actual change in pay. 

The proportion of failures to report was 
significantly higher than that of false report- 
ing. This was true in both surveys (x° = 7.8 
and 9.2, df= 1). Employee reports, it is 
seen, understated the actual frequency of 
change. 
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Individual Differences in Accuracy 


Data on reported and actual change in pay 
in both periods could be compiled for 211 
respondents and are shown in Table 4. The 
classifications A, B, C, and D are identical 
with those used in Table 3. The integers show 
the observed frequencies of respondents. 

It was hypothesized that all deviations be- 
tween perception and actuality could be ex- 
plained in terms of (a) a single population 
proportion of failure to report and (0) a 
single population proportion of false report- 
ing. The former proportion was estimated by 
taking the weighted average of the two ob- 
served proportions, that is p (45 + 35) 
(71+ 45 + 814 345. The latter pro- 
portion was similarly estimated as p, = (20 
+ 14)/(75 + 20 + 81 + 14) .179. The es- 
timated proportions of correct reporting of 
actual change and of actual absence of change 
were consequently g, = 1 — p, and 
g. = 1 — p, = .821. The frequencies of actual 
pay change in both periods (m= 53), in 
neither period (n 
period (n 
riod (n 


35) 
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32), in only the first 
63), and in only the second pe- 
63) were regarded as given and 


were compiled directly from Table 4. The 16 
expected frequencies of respondents were cal- 
culated by multiplying each of these four fre- 
quencies with the four products of relevant 
pairs of proportions. (For instance, the ex- 
pected frequencies were calculated as 53q,° 

22.7 for Cell AA, as 53p,g, = 12.0 for 
Cell CA, and as 63p,g, = 17.8 for Cell CB.) 
The chi squared value for the 16 cells (ex- 
cluding the irrelevant cells in the marginal 
column and row) was calculated. Ten degrees 
of freedom were available, since 4 of the 
original 16 df for these cells were used in 
calculating the four frequencies of actual 
change and absence of change and since two 
degrees were used in estimating the propor- 
tions p, and p.,. 

The deviations between observed and ex- 
pected frequencies were found to be statisti- 
cally nonsignificant (x*°= 11.9, df = 10). 
This confirmed the hypothesis that the entire 
distribution of deviations between perception 
and actuality was consistent with the two 
overall error proportions p, = .345 and p, 
= .179. Accuracy of reporting for one period 
was consequently independent of both accu- 
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racy and actual pay behavior in the other 
period. The independence of errors in the two 
surveys implies that degree of accuracy was 
not a persistent characteristic of the indi- 
vidual and that, expressed differently, there 
were no individual differences in accuracy of 
employee reports on changes in pay. It would 
follow that accuracy of reporting could not 


be significantly associated with age, sex, edu- 


cation, economic status, job satisfaction, and 
other stable or only moderately fluctuating 
differential characteristics of the individual; 
tests of association in these respects would be 
superfiuous. Accuracy could at most be asso- 
ciated with the magnitude of the stimulus 
(the size of the pay increase) and with attri- 
butes characteristic of the individual at only 
one of the survey: dates. 

However, the findings were based on rela- 
tively small samples. Caution therefore sug- 
gested that available data be used for exam- 
ining some plausible hypotheses which imply 
that accuracy was an individual matter. Tests 
of such hypotheses are reported following an 
analysis of the relationship of accuracy to the 
size of the pay increase and to the perception 
of change in various aspects of the job. 


Size of Pay Increase 


The size distributions of actual changes in 
weekly salary were shown in Table 2. Most 
of the increases were either merit increases or 
raises associated with a new employee’s pro- 
bationary period and were rather small. The 
few large increases were associated primarily 
with transfers, promotions, or other substan- 
tial changes in job duties. The merit increases 
tended to be larger in absolute amounts in 
higher salary brackets than in the lower ones, 
making the variation in percentage increases 
even smaller than the variation in absolute 
increases. 

In the second survey, 44% of those receiv- 
ing increases of one or two dollars per week 
between November 1957 and May 1958 failed 
to report them. Those whose pay rose by 
three or four dollars as well as those getting 
increases of at least five dollars failed to 
report the pay changes in 29°% of the cases. 
Statistically the relationship between size of 
pay increase and accuracy was barely signifi- 
cant (x? = 6.3, df = 2). There was no sig- 


nificant association in the first survey between 
size of actual increase and failure to report. 
However, those getting raises of at least three 
dollars tended to be more accurate in their 
reports. This weakness of the relationship be- 
tween size of pay increase and accuracy, 
which was quite unexpected, might be due to 
the smallness of both the sample and the pay 
increases. 


Reporting of Change in Other Job Aspects 


Those who failed to report actual pay 
changes were slightly less inclined to report 
increases in the other 13 job aspects of the 
perceived change check list than were em- 
ployees who correctly reported actual pay 
increases. Those who falsely reported pay 
changes, although none had occurred, re- 
ported changes in the other 13 aspects more 
frequently in the first survey, and less fre- 
quently in the second survey, than did those 
who correctly reported there had been no 
change in pay. However, the differences were 
in the nature of weak tendencies: they were 
significant at the 5% level only for five of 
the aspects and then only on one or two of 
the four comparisons. It is concluded that 
failure to report and false reporting did not 
result from a response set in filling out the 
check list or from tendencies to perceive 
change, or absence of change, in other job 
aspects. 


Accuracy of Reports on Current Pay 


As mentioned in the Data section the em- 
ployees were asked in the November 1957 
survey to state their current salaries before 
taxes and other deductions. Of the 269 re- 
spondents answering this question, 739% gave 
exactly correct answers, 4% and 15° over- 
stated and understated them by less than 
6%, and 2% and 6% overstated and under- 
stated them by more than 6%. It is seen that 
understatement was significantly more com- 
mon than overstatement (x? = 23.7, df = 1). 
However, the Pearsonian coefficients of cor- 
relation between actual and reported salaries 
were r= .98 for women (N 211) 

- .99 for men (N = 58). 

Those whose pay had actually increased 
since May 1957 and those whose pay had not 
changed were equally often erroneous in their 


and r 
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reports on November pay. However, under- 
statement accounted for 93% of the errors 
in the former group but only for 58% in the 
latter group, the difference being statistically 
significant (x° = 6.6, df =1). Understate- 
ment of current pay therefore appeared to re- 
sult essentially from the recency of the pay 
change. 

There was no significant association be- 
tween errors in reporting current pay and 
errors in reporting changes in pay. 
Economic Characteristics of Respondent 

It was originally thought that those for 
whom salary received was economically very 
significant would differ in accuracy of report- 
ing change from those to whom the salary 
mattered less, but no hypothesis was selected 
concerning the sign of the difference. Failures 
to report were found to be equally frequent, 
however, among those whose households could 
live adequately without the respondent’s in- 
come and those whose households could not. 
Whether the respondent was the sole wage 
earner in the household, the main but not the 
sole wage earner, or only a supplementary 
wage earner was also unrelated to failure to 
report. The same lack of significant differ- 
ences was found for false reporting. 


Job Satisfaction 


A complex of considerations suggested that 
accuracy of reporting on changes in pay 
might be associated positively or negatively 
with employee feelings regarding pay or other 
aspects of the job. Several comparisons of 
satisfaction and _ perceptual were 
made. 

The accuracy of reporting in the second 
survey was compared with satisfaction with 
pay as reported in the first survey. Failures 
to report were found equally often among 
those dissatisfied with pay and among those 
satisfied with it. The same was observed with 
respect to false reporting. 

Since pay changes were made by the re- 
spondent’s supervisor acting on behalf of the 
company, it might be suggested that errors 
in the reporting of pay would be associated 
with the respondent’s attitude toward the 
supervisor and the company. However, no 
such association was found. Furthermore, 


accuracy 
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overall job satisfaction was also found to be 
unrelated to accuracy of reporting. 

A separate series of tests were made, using 
the job satisfaction responses of the second 
survey together with the accuracy-of-report- 
ing data of that survey. No significant asso- 
ciations were found in this series either. 

The findings strongly suggest that percep- 
tion of pay change was not influenced by 
feelings about the level of pay, the person or 
organization making the pay change, or the 
general setting in which the pay change took 
place. 

It is interesting to note, however, that 
actual as well as perceived change in pay 
apparently influenced satisfaction with pay. 
Those whose pay actually changed from the 
first survey date to the second became more 
satisfied with pay, while those who received 
no pay change tended to become less satisfied 
(x? = 17.9, df = 1). Those who failed to re- 
port the increase showed significantly less 
rise in satisfaction than those who did report 
it (x? = 6.4, df = 1). 


Personal Data 


There was no significant association be- 
tween accuracy of reports on change in pay 
and the respondent’s age, or years of 
education. 


sex, 


SUMMARY AND DISCUSSION 


Responses of salaried insurance company 
employees to questions, asked in two succes- 
sive questionnaire surveys, concerning changes 


in amount of pay they had experienced dur- 
ing the preceding two 6-month periods were 
compared with actual changes in amount of 
pay received. Correct responses were given 
by 68% of the respondents in the first survey 
and by 75% in the second. Statistically the 
association between actual and_ reported 
change was very significant, which demon- 
strated that perceived change in pay had 
some degree of validity as a measure of actual 
change in pay. However, the degree of accu- 
racy was much less than perfect. Further- 
more, because failure to report actual pay 
change was significantly more common than 
reporting of change that actually did not 
occur, perceived change represented a biased 
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estimate, understating the actual frequency 
of change. 

Correctness of response in the first survey 
was not significantly associated with correct- 
ness of response in the second survey, which 
suggested that there were no individual differ- 
ences in the probability of erroneous report- 
ing of either kind. The inference that accu- 
racy was not associated with relatively stable 
differential characteristics of the respondents 
was supported by other findings: (1) Accu- 
racy in reporting was not associated with 
wage earner status, adequacy of household 
income, age, sex, and years of education. 
(2) Satisfaction with pay, relationship to 
supervisor, company, and the job in general 
showed no significant relationship to accuracy 
of reporting. 

Although current pay as reported by the 
respondents correlated very highly with actual 
pay (r= .98 for women and r= .99 for 
men), 27% of the respondents gave incorrect 
reports, and underreporting was significantly) 
more common than overreporting. 

Accuracy in reports on changes in pay was 
not associated with accuracy in reports on 
current pay. This suggested that whatever 
factors may have operated differentially upon 
individuals to cause inaccuracy of reports on 
current pay did not contribute to inaccuracies 
in reporting on changes in pay. 

Accuracy in reporting on changes in pay 
was only weakly related to the size of the 
pay change and to changes perceived in as- 
pects of the job other than pay. The latter 
finding suggested that inaccuracy in reporting 
was not simply the manifestation of a re- 
sponse set present in completing the perceived 
change check list employed in the study. 

The findings as a whole suggest that errors 
in reporting changes in pay could be regarded 
as random relative to the individual. This 
conclusion contradicts the notion that favor- 
able attitudes lead to perception of positive 
change, at least for perception of change in 
pay. It points up the need to ascertain what 
determines the probability of failure to report 
and the probability of false reporting. 

Assuming that the estimated accuracy of 
employee reports on char. “1 pay is repre- 
sentative of the validity of , erceived change 
questions, is it warranted to rely on the per- 


ceived change technique in research on social 

aspects of technological or organizational 

change? The answer depends in part on the 
availability and validity of other techniques. 

It is obvious, however, that the perceived 

change measure leads to biased estimates of 

the differences among any two groups in the 
proportions of persons actually exposed to 
change. It can be shown that the bias consists 
in an understatement of the difference and 
that its expected value is the sum of the two 
error probabilities times the true difference 
in proportions between the groups. Hence, 
the rank order of groups on the basis of per- 
ceived change would not be reversed relative 
to the ranking on the basis of actual change, 
unless the sum of the two error proportions 
exceeded unity. But this is scant consolation, 
because the conclusion holds only in the long 
run, and sampling variations in the error pro- 
portions might easily cause inversions in rank 
order. For example, in one use of the present 
data, six departments were ranked (a) on 
the basis of actual proportion of pay changes 
and (5) on the basis of the proportion of 
perceived change in pay, and the rank order 

correlation was found to be rho = .77. 
The observed accuracy is sufficiently low 

to warrant concern, but it does not in itself 

constitute compelling evidence that perceived 
change measures are too invalid for use. It is 
recommended that the study be replicated in 
other settings, that coverage be extended to 
other measurable variables, and that a more 
finely graduated set of response categories be 
tried. When comparisons of attitudes and 
salary changes are made on a group rather 
than individual basis or when identified ques- 
tionnaires are used, one does well, however, 
to seek salary information from organization 
records. 
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RATING RESEARCH PRODUCTIVITY 
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In a recent article (Stoltz, 1958) the de- 
velopment of two forced-choice rating scales 
for the determination of the productive re- 
search behavior of persons engaged in physical 
science and engineering research was de- 
scribed. The study to be reported here was an 
attempt to determine if a revised vetsion of 
these scales could be used in an industrial 
situation other than the one in which the 
original scales were developed to describe the 
research behavior of engineers. Specifically, 
the study was designed to determine whether 
the revised forced-choice scale had general 
applicability. 


MeETHOD 


The Ss used in the study were 50 em- 
ployees of a large, southwestern electronics 
company.’ All Ss were technically trained en- 
gineers engaged in mechanical and electronic 
engineering research fields. The Ss were se- 
lected randomly from a research division of 
168 engineers involved in such research proj- 
ects as microwave development, channel car- 
rier systems, airborne weather radar, transis- 
torized high frequency transceivers, and other 
forms of test and communications systems. 
The 50 Ss represented 45 holders of bachelor’s 
degrees, four master’s degrees, and one PhD 
degree. The average age of the Ss was 28.4 
years with the age range from 23 to 41 
years. The range of job experience of the 
sample was from 12 to 134 months, with an 
average job experience of 43.2 months. 

A rating form, currently used in this com- 
pany to rate the job performance of the re- 
search personnel, was used to furnish the 
best available assessment of a man’s research 
productivity in this setting. For the purpose 
of this study it will be considered the local 
criterion and will be referred to as such. 
This rating form consisted of eight graphic 

1The authors would like to thank the manage- 
ment and personnel of the Collins Radio Company 
for their assistance in conducting this study. 


276 


rating scales, seven covering specific job be- 
haviors, and one overall ability scale. 

The forced-choice rating scale with which 
this study is concerned is a revision of the 
scales designated by Stoltz (1958) as the Re- 
search Behavior Descriptions I and II. The 
revised version, Research Behavior Descrip- 
tions III (RBD III) consists of 25 tetrads 
from the original scales selected on the basis 
of an item analysis performed by Stoltz. 

Two top-level supervisors in the division, 
who were well acquainted with the Ss, served 
as raters. The raters were given a list of the 
randomly selected Ss that were to be rated. 
Through a letter of explanation, as well as a 
verbal explanation, the raters were informed 
of the purpose of the study and the procedure 
to be followed. Each of the two raters made 
an independent rating of each S on both the 
local criterion and the RBD III. Then the 
two raters discussed the ratee and their rat- 
ings and reached an agreement on the rating 
to be given. The raters had no previous knowl- 
edge of the RBD III other than the instruc- 
tions given for its completion, but they ha@ 
considerable experience and familiarity with 
the local criterion. 

The data were analyzed by computing the 
product moment correlations between the 
local criterion and the RBD III.’ 


RESULTS 


Two local criterion scores were developed 
for each of the Ss. The first score, the overall 
productivity score, consisting of the rating of 
the S on the overall ability scale of the local 
criterion. It was found that full use of the 


5-point scale for this characteristic was not 
made by the raters who tended to use only 
two of the five positions on the scale. All 
correlations regarding this overall produc- 
tivity score have consequently been corrected 

2 All computations were done with the assistance 
of the Southern Methodist University Computing 
Laboratory on the Univac 1103 
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for coarseness of grouping. The second score 
consisted of the average rating of the S on 
the seven specific job behavior scales of the 
local criterion. This score will be referred to 
as the average job behavior score. Table 1 
shows the correlations between the overall 
productivity scores (corrected), averaged job 
behavior score, and the RBD III scores. 


DISCUSSION 


As Table 1 


rived 


indicates, the two scores de- 
from the local criterion were highly 
intercorrelated. The correlation of the RBD 
III with both of these scores was significant 
beyond the .01 level. The slightly higher cor- 
relation of the RBD III with the averaged 
criterion score is undoubtedly due to the 
greater variability of the criterion scores. 
This study was not an attempt to validate 
the RBD III as a measure of productivity. 
It would seem from its correlations with the 
supervisory ratings in this study, as in the 
study by Stoltz, that if one is willing to ac- 
cept supervisory judgments of a person's crea 
tivity activity as meaningful it would appear 
to have validity. The correlations between 
RBD III and the overall criterion simply 
represent the relationship between ratings ob- 
tained with one device with ratings obtained 
with another device. The principal merits of 
the RBD III as a rating form are its some- 
what disguised purpose, its greater freedom 
from intentional bias, and its tendency to 
show greater relative variability than overall 
productivity ratings. The extent to which the 
RBD III is related to other criteria of re- 
search productivity remains in question. How- 
ever, the study does suggest the generality of 
applicability of the RBD III. While the RBD 
III was developed within one installation to 


TABLE 1 


RATING INTERCORRELATIONS 


Overall 
Productivity 


Averaged 
Job Behavior 


RBD Il .59* 61* 
Overall Productivity 94* 


* Significant beyond the .01 level. 


measure supervisors’ ratings of productivity, 
it now appears that it can be used in at least 
one other setting to predict their supervisory 
ratings. A tenuous generalization would be 
that the RBD III could be used in other 
physical science research settings, where other 
criteria of research productivity are absent, 
to provide estimates of a person’s productive 
research behavior. 


SUMMARY 


The RBD III, a forced-choice rating form 
to provide scores indicative of a person’s pro- 
ductive research behavior in physical science 
research settings, was administered in a set- 
ting other than the one in which it was 
developed. The results would suggest that the 
RBD III can be used to provide criterion 
scores for research productivity in other 
physical science research settings. 
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It is generally recognized that the accept- 
ance of decisions by group members is in- 
creased through their participation in making 
the decision. Those who favor group decision 
emphasize the factor of acceptance (e.g., 
Coch & French, 1948; Gordon, 1955; Maier, 
1952). The recognized merits of group deci- 
sions and permissive forms of leadership are 
primarily confined, however, to their motiva- 
tional value and to the improved communi- 
cation they afford (Maier, 1952). The higher 
quality of group decisions and group problem 
solutions is less generally accepted (e.g., 
Faust, 1959; Marquart, 1955; Taylor, Berry, 
& Block, 1958). Groups certainly differ in 
their ability to solve problems and the results 
of experiments comparing the quality of 
group and individual problem solving are, 
taken as a whole, inconclusive (e.g., Barn- 
lund, 1959; Faust, 1959; McCurdy & Lam- 
bert, 1952; Taylor & Faust, 1952). 

One of the sources of resistance to the use 
of group decision in organizations is the fear 
that the group products may be of poor qual- 
ity. The same type of fear arises when efforts 
are made to increase delegation by managers. 
If the superior is held responsible for his 
decisions, he feels more comfortable if he 
has control over them rather than permitting 
subordinates to make them. 

Maier (1950) previously has emphasized 
the need for leadership skills in upgrading 
the products of group decision, especially em- 
phasizing the skills in stating a problem and 
asking good questions. Research evidence for 
this position has more recently become avail- 
able. Maier and Solem (1952) found that 
quality could be improved merely by giving 
minority opinions a chance to be heard and 
not subdued by social pressure. Maier and 
Maier (1957) demonstrated that the product 


1 This investigation was supported by a USPHS 


research grant (M-2704) from the National Insti- 
tute of Mental Health, United States Public Health 
Service. 


of group decision was improved through the 
use of the developmental discussion method, 
in which the problem was broken into parts, 
as compared with the free discussion method, 
in which no control over the orderly discus- 
sion of the topic was exerted. 

Recently Maier has indicated that inferior 
products in group thinking may be caused by 
the leader being overly concerned with a solu- 
tion, so that the probelm is not fully explored 
(Maier, 1958). It appears to be a human 
tendency to seek solutions even before the 
problem is understood. This tendency to be 
“solution-minded” seems to become even 
stronger when there is anxiety over the nature 
of the decision. When a superior in an organi- 
zation offers a problem to his subordinates 
for group solution, he may, in his anxiety to 
obtain a high quality decision, become overly 
solution-minded—and cause his subordinates 
to become so also—and tend to exert undue 
pressure in the direction of a particular solu- 
tion. However, by directly or indirectly exert- 
ing control over the decision, he invariably 
limits the effectiveness of the problem solving 
since his subordinates tend to concern them- 
selves with determining his state of mind, 
rather than solving the problem. 

To avoid this possibility and to obtain high 
quality solutions from their subordinates, 
Maier (1958) has recommended that supe- 
riors adopt an attitude of “problem-minded- 
ness” to replace the attitude of “solution- 
mindedness.’ He also mentioned the desira- 
bility of separating the idea-getting activity 
of group processes from the evaluative 
process, since these interfere with each other. 

One way of testing these hypotheses is to 
require a group to obtain a second solution 
to a problem after a first one has been pro- 
duced. If a group has solved a problem and 
there has been undue presure exerted in the 
direction of one solution, the mere request 
to obtain a second solution should permit 
problem issues, previously overlooked or sup- 
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pressed, to be raised and exert an influence. 
If it is desirable that the period of idea- 
getting be increased, this simple request for 
a second solution should give rise to a re- 
newed problem-mindedness and search for 
new ideas. It is believed further that this 
procedure will allow ideas to more readily 
come to the fore, since the needs to be critical 
and evaluative have been partially satisfied 
in the process of obtaining the first solution. 
Thus, the second set of solutions can be ex- 
pected to be superior to the first set. 

In asking for a second solution, however, 
one runs the risk of gaining quality while 
losing acceptance. Having arrived at one solu- 
tion to the problem through a discussion of 
the pros and cons of various alternatives, the 
group members usually feel highly committed 
to their decision. Bennett (1955) has shown 
that merely arriving at a group decision, 
especially where group consensus about the 
decision is perceived by the members, pro- 
duces strong forces in the individual to act 
on the basis of the decision, implying strong 
individual commitment to the decision. Fur- 
thermore, Hovland and his associates (1959) 
have shown that people who are committed 
to a particular attitude position are more 
resistant to persuasion attempts to change 
their attitudes than are people who are not 
so committed. Thus, although groups might 
produce a second solution to a problem on 
request, the commitment to the first solution, 
produced by the process of reaching agree- 
ment, might mediate against 
the second one. 

On the other hand, if the second solution 
were to be obviously superior to the first one, 
the group members might conceivably shift 
their preference to that one. Since a high 
quality solution with low acceptance may be 
inferior to a low quality solution with high 
acceptance, it also is necessary to determine 
the relative satisfactions of the group mem- 
bers with the first and second solutions. 


acceptance of 


METHOD 
Subjects 


One hundred undergraduate students in the eight 
laboratory sections of the course Psychology of 
Human Relations were assigned 
groups of four to solve The Change of Work Pro- 
cedures problem (Maier, 1955, p. 314). In each 


randomly to 25 


group the roles of the foreman and three workers, 
Jack, Walt, and Steve, were 


the four members 


assigned randomly to 


Problem 


The Change of Work Procedures problem is a 
four-man role playing involving a foreman 
and three workers who assemble carburetors in an 
automobile factory. The assembly 
vided into three positions and the workers adopted 
a system of hourly rotation among the three jobs 
The role playing consists of a meeting called by the 
foreman to discuss the possibility of their changing 
their work method to one in which each man works 
on one position only, his best position according to 
time-study data given to the foreman. Although 
theoretically the new method should increase the 
productivity of the workers and thus increase their 
piece-rate wages, the foreman’s suggestion of a 
change to the new method usua..7 meets with con- 
siderable resistance. The workers express resentment 
at the time-study man’s “spying” on them, are 
suspicious of management’s motives in suggesting 
the new work method, and fear that the boredom 
resulting from working on a single job the entire 
day will counteract any gains obtained from exploit- 
ing each worker’s aptitude for a special position 

The solutions to the problem can be classified, 
for the most part, into three types: old, new, and 
integrative, with a number of somewhat different 
possible solutions for each type. Old solutions are 
those in which (a) the existing rotation 
method is retained, or (b) some other period of 
rotation is introduced, but where equal amounts of 
time are spent on each of the three positions. New 
solutions accept the supervisor’s suggestion of work 
ing fixed positions, either completely or with some 
modification such as rest pauses, music, or 
breaks. Integrative solutions are attempts to exploit 
the individual differences in ability, while avoiding 
the unfavorable effects of monotony. Such things as 
rotating between the two best jobs or putting more 
time in on the best job are included in this category 
Since only the integrative solutions require the group 
to develop a new work method—the old and new 
solutions being provided in the roles—the integra- 
tive solutions are considered to have the highest 
quality. Whether asking for a second solution in 
creases quality will be determined by whether the 
proportion of integrative solutions is increased in 
the second over the first set of solutions 


case 


operation is di- 


hourly 


coffee 


Procedure 


The instructor in each laboratory section used the 
multiple role playing procedure (Maier, 1952, p 
146) to conduct the case, reading the general instruc 
tions to the entire class and distributing the roles 
to each group member. At a signal, the groups 
simultaneously solved the problem, and, upon com- 
pleting the discussion, the “foreman” wrote out the 
<olution arrived at and all group members completed 
a satisfaction questionnaire individually. During the 
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rABLE 1 


DISTRIBUTIONS OF FIRST AND SECOND SOLUTIONS TO CHANGE OF WORK PROCEDURE PROBLEMS 


Old 


Students 
First solutions 
Second solutions 
Nurses 


First solutions 


34.5 
31.0 


Second solutions 


completion of these tasks, the groups asked 
to refrain from discussing the case since more work 
was to follow. A 2-min. warning was given at the 
end of 18 min. to those groups which had not fin- 
ished and all problem solving was stopped at the 
end of 20 min. Following the collection of solution 
sheets and satisfaction questionnaires from the first 
solution, the instructors asked the groups to resume 
discussion and obtain another solution to the prob- 
lem. The role assignments for work on the second 
solution were the same as for the first, and again, 
solution sheets and satisfaction questionnaires were 
completed. Although no time limit was set for the 
second solution, all groups finished within 20 min 
The time taken to solve the first and second prob- 
each group.was secretly recorded by the 


were 


lems by 
instructor. 
Except that the problem solutions were written 
out and members’ expressions of satisfaction with 
the solutions were obtained by questionnaires—both 
usually being reported verbally—the procedure used 
in the study was identical with the usual way of 
conducting these laboratory 
The identical procedure was used to collect first 
and second solutions to this problem from 29 groups 
attending moti 


sessions 


of nurses a one-day conference on 


New 


Integrative Total 


N 


100.0 
100.0 


100.0 
100.0 


vation conducted by the senior author. The number 
of groups was too large, however, to collect meas 
ures of satisfaction or to record the time taken for 
solution 


Satisfaction Measure 


To determine the members’ acceptance of the solu- 
tions they were asked, after each solution, to respond 
individually to the question, “How satisfied are you 
with the (second) solution reached by your group?” 
Each S checked one of six alternatives: “Very satis 
fied, quite satisfied, fairly satisfied, neither satisfied 
nor dissatisfied, somewhat dissatisfied, and very dis- 
satisfied.” Values from 1 to 6, representing increasing 
levels of satisfaction, were assigned to each of these 
alternatives, 1 being assigned to “very dissatisfied” 
ind 6 to “very satisfied.” 

Following the second solution Ss were also asked 
the question, “On which of the two work methods 
proposed by your group would you work faster?” 
They checked one of the five alternatives: “I would 
work much faster on the first than on the second, 
somewhat faster on the first than on the sec- 
ond, about the same on both, somewhat faster on 
the second than on the first, I would work much 
faster on the second than on the first.” 


TABLE 2 


TYPES OF SHIFTS FROM FiRsST TO SECOND SOLUTIONS AS COMPARED TO EXPECTANCY 


(Integrative versus Old and New Solutions 


Old or New First Solution 


Shifted to 
Integrative 


Group 
Students Obtained 
Expec ted 
Obtained 
Expected 


square test it the .01 level 


Integrative First Solution 


Remained 
Integrative 


Shifted to 
Old or New 


Remained 


Old or New 


10 
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TABLE 3 


““WORKER”’ 


SATISFACTION WITH FIRST AND SECOND SOLUTIONS 


(Student Groups) 


All Groups 


First 
Solution 
Mean satisfaction 4.9 
Standard deviation 1.18 


Number of group members 75 
g 


RESULTS 


The distribution of first and second solu- 
tions to the Change of Work Procedures prob- 
lem by the student and nursing groups ac- 
cording to the three types—old, new, and 
integrative—appears in Table 1. The shift 
from old and new solutions in the first set 
to integrative solutions in the second set is 
striking. Whereas only 16.0% of the student 
groups produced integrative solutions on the 
first try, 52.0% of the second solutions were 
of this type. The comparable percentages for 
the nurses were 6.9 and 34.5. Since, as indi- 
cated above, the integrative solutions may be 
regarded as the highest quality solutions, 
these differences represent significant in- 
creases in the quality of the group problem 
solving as a result of 
solution. 


asking for a second 

The magnitude of the increase in quality 
can be seen clearly from the data presented in 
Table 2. In that table are shown for both 
students and nurses the numbers of groups 
which (a) originally produced old or new 
solutions and switched to integrative solu- 
tions or again produced an old or a new 
solution and (0) originally produced inte- 


Groups Shifting to 
Integrative Solutions 


Second 
Solution 


First 
Solution 


Second 
Solution 
4.9 

1.27 


75 


grative solutions and shifted to an old or 
now solution or produced another integra- 
tive solution. These obtained values are com- 
pared with the values expected if only those 
forces were operating in the second problem 
solving sessions which operated in the first 
ones. Thus, for example, 16%¢ of the 21 stu- 
dent groups which initially produced old or 
new solutions could be expected to shift to 
integrative solutions on the second attempt. 
Calculating the expectancies in this way, the 
chi squares for Table 2 were 19.28 for the 
students and 24.67 for the nurses, which are 
significant at well beyond the .01 level of con- 
fidence. The significance of the data arises 
principally because of the large proportion of 
shifts from old and new to integrative solu- 
tions. 

It will be seen in the left-hand side of 
Table 3 that the mean “workers’”’ satisfaction 
with their first and second solutions, calcu- 
lated for all student groups from the values 
assigned to the question on satisfaction with 
the solution, were identical (4.9). Further- 
more, the members of the 11 groups which 
shifted from old or new solutions to integra- 
tive solutions were even slightly more satis- 


TABLE 4 
METHOD ON WHICH WoRKERS WILL “WorK FASTER” 
(Student Groups) 


Much or Somewhat 
Faster on First 


All groups 
Groups shifting to 


integrative solutions 


About the Same 
on Both 


Much or Somewhat 
Faster on Second 


q 7 N 
36.5 29 


33.3 16 
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fied, though not significantly so, with their 
second than their first solutions (see right- 
hand side of Table 3). Thus, improved qual- 
ity of solution without any loss of acceptance 
was obtained by asking the groups to solve 
the problem twice. 

Responses to the question “On which of 
the two work methods proposed by your 
group would you work faster?” show some- 
what more preference for the second solution 
than for the first by all the groups (Table 
4). Whereas 39.2% of the group members 
said they would work “much or somewhat 
faster on the second than on the first,” only 
24.3% said they would work faster on the 
first. The preference for the second solution 
was even more marked among the members 
of groups which shifted to the integrative 
solutions. Almost half (48.5%) of these 
group members said they would work faster 
on the second method, while only 18.2% 
said they would work faster on the first. In 
all 81.8% of the members who produced 
integrative solutions on the second attempt 
either said they would work somewhat faster 
on the second method than on the first or 
about the same on both. Thus, further evi- 
dence is provided that solution acceptance 
was not sacrified for solution quality in these 
groups and may even have been somewhat 
increased. Of those who expressed a prefer- 
ence, more than two and one-half times as 
many people preferred the second than pre- 
ferred the first solution. 

Finally it should be noted that it took 
only about two-thirds (64.7%) as much 
time, on the average, to obtain the second 
solutions as the first. The mean time taken 
for the first solutions was 17.0 min. with a 
standard deviation of 5.18 min., and for the 
second solutions 11.0 min. with a standard 
deviation of 5.60 min. No greater time was 
taken, on the average, by those groups which 
shifted from old or new to integrative solu- 
tions; the means for those groups being 16.8 
and 11.1 min., respectively, for the first and 
second solutions. 


DISCUSSION 
The results of this study show that the 


double-solution method has been a powerful 
one for improving the quality of group solu- 
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tions to the Change of Work Procedures 
problem. They suggest a confirmation of the 
hypothesis that a request for a second solu- 
tion to the problem would renew the “prob- 
lem-mindedness” of the group members and 
permit a renewed search for alternative work 
methods. It is probably also true that the 
foremen, having dominated or gotten their 
own way in the first solution, were less likely 
to push for the new method and to be more 
responsive to the workers’ views. 

Important also is the fact that the im- 
proved quality was obtained without any loss 
of acceptance. Since the workers’ satisfaction 
with their second solution was no less than 
their satisfaction with their first, it would 
presumably be as easy to put the superior 
second solutions into effect as it would the 
first ones. 

Finally, the fact that such striking im- 
provements in quality were obtained at a 
relatively small increment in time spent sug- 
gests the administrative desirability of the 
double-solution method. Presumably some of 
the time spent on the first solution was used 
to discuss the merits and deficiencies of a 
method or methods other than the one agreed 
upon, so that such discussion need not be 
repeated the second time and other alterna- 
tives can be explored. 

By obtaining two solutions to a problem 
a discussion leader can change a problem 
situation into a choice situation. Having 
reached agreement on one solution, he can 
ask the group to explore the problem further 
in the search for other acceptable solutions, 
then give participants the opportunity to 
choose the most acceptable one. Thus, even 
if a second solution turned out to be of in- 
ferior quality, the chance to choose between 
alternatives would still permit the higher 
quality solution to be accepted. This is a 
simple method open to a conference leader 
which can lead to upgrading the quality of 
problem solving discussions without increas- 
ing the basic aptitudes of either the leader 
or the participants. 


SUMMARY 


The present investigation was designed to 
determine the effectiveness of a new tech- 
nique for improving the quality of group 
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solutions. The technique studied was to ask 
groups to solve a problem twice, on the as- 
sumption that the second problem solving 
session should give the groups the opportu- 
nity to explore a number of solutions and be 
free of the need to make rapid evaluations 
of only one. 

One hundred students in the Psychology of 
Human Relations course were divided into 
25 groups of four each to role play the case 
of the Change of Work Procedures. The roles 
of the foreman and three workers were as- 
signed randomly to the members of each 
group. Each group initially was given up to 
20 min. to reach agreement on a _ work 
method. After all the groups had finished, 
they were asked to arrive at a second solution 
to the problem. 

The solutions to the case were classified 
into old, new, and integrative solutions. A 
significantly higher proportion of second than 
of first solutions was of the integrative type, 
the highest quality solution. Whereas only 
13.6% of the groups which had originally 
produced old or new solutions were expected 
to shift to integrative solutions, 44.0% actu- 
ally did, a difference significant at the .01 
level of confidence. There was no difference in 
satisfaction, on the average, between first and 
second solutions, and the mean satisfaction 
of the members of the groups which shifted 
to integrative solutions was even slightly 
higher for the second than for the first solu- 
tion. Furthermore, a higher percentage of all 
group members said they would work some- 
what faster on the second than on the first 
solution and this percentage difference was 
even greater in those groups which had 
shifted to the integrative solutions. Using the 
double-solution method, then, solution quality 
was generally increased without a loss of 
acceptance. 

The double-solution method also seemed to 


require little more time than it took to obtain 

one solution. It is suggested that this method 

should improve group solutions to problems 
particularly where the leader has a preferred 
solution. 
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This study illustrates the application of 
two methodological precepts believed to be 
of value in the development of psychological 
diagnostic instruments. The first of these is 
the transition study (e.g., Campbell, 1957b, 
p. 310): When a new procedure is proposed 
to replace an old one, a transition study 
should be conducted in which both the new 
and the old are jointly applied, and the supe- 
riority of the new thus directly demonstrated. 
When in the late 1930’s, projective tests re- 
placed structured questionnaires as the prin- 
cipal tools of personality and clinical psychol- 
ogy, transition studies were conspicuously 
lacking. Instead, the abundant evidences of 


the imperfections of the structured tests 


(which their quantitative scores made readily 
ascertainable) were used with faulty logic to 
argue for the superior validity of the new pro- 


jective devices. The present study belatedly 
offers such a transition study, comparing two 
sentence-completion tests with two direct atti- 
tude scales designed to measure the same two 
attitude syndromes. 

The second methodological precept to be 
illustrated is the multitrait-multimethod ma- 
trix (Campbell & Fiske, 1959): For the vali- 
dation of a given test as a measure of a given 
trait, a matrix of correlations is required in- 
volving not only measures of the given trait 
through two different methods, but also tests 
resulting from the application of both meth- 
ods to the measurement of some other, pre- 
sumably independent, trait. This is not the 
place to attempt to justify fully this seem- 
ingly dogmatic and arbitrary criterion—suffice 
it here to say that the requirement is closely 
related both to the concept of construct va- 
lidity (Cronbach & Meehl, 1955) and to the 

1 The statistical analyses reported in this paper 
were supported by funds from The Graduate School, 
Northwestern University 

2 Now at Harvard University. 

’ Now at the Ohio State University 


well known threat to validity represented by 
irrelevant factors such as halo effects, re- 
sponse sets, etc. In the present study, the two 
methods employed were the sentence comple- 
tion test and the structured attitude test. The 
two traits investigated were attitude toward 
home and parents (H & P), and attitude to- 
ward law and justice (L & J). 


METHOD 


Forty sentence completion stems, modifications of 
the Rotter Incomplete Sentence Blank (Rotter & 
Rafferty, 1950), were used in the first part of the 
test administration. Ten stems were designed to elicit 
responses relevant to S’s attitudes toward home and 
parents (e.g., “Back home »” “My father b 
“When I was a child . . .”) and 10 intended to elicit 
responses relevant to S’s attitudes toward law and 
justice (eg., “Any cop “Our laws 
“My trial . . .”). Responses to every stem were in- 
spected by the scorers and either left unscored 
(where the response was not relevant to either of 
the two attitudes under scrutiny) or scored as per- 
tinent to one or other of the attitudes. Scoring of 
responses was along a 5-point scale of intensity of 
favorableness of attitude. Details of the test and 
scoring are reported (Watt & Maher, 
1958). Following the sentence-completion items there 
were 16 direct attitude statements, each 
swered by “agree,” “disagree,” or “undecided.” Of 
the eight items for each topic, four were worded 
favorably and four unfavorably, so as to eliminate 
from the total score the effects of acquiescence re- 
sponse set (Chapman & Campbell, 1959; Cronbach, 
1946). Sample items are 


elsewhere 


to be an- 


1. Cops often carry a grudge against men who get 
in trouble with the law and treat them cruelly. 

. For the most part, justice gets done by the 
police and the courts 

. Did your parents ever fail in their duty to you? 

. Did your parents take a great deal of interest 
in you? 


To make the score independent of any consistent 
individual differences in the frequency of using the 
“uncertain” response, the score used was the propor- 
tion of favorable (ie., agreements on 
favorable items and disagreements on unfavorable 
items) of the total of decisive responses (ie., all 
favorable responses plus all unfavorable responses). 


responses 
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The Ss were 79 prisoners from the Indiana State 
Prison in Michigan City, Indiana, primarily from the 
elementary school division of the prison. The Ss 
selected by the parole officer to provide a 
variety of crime types, and in terms of accessibility 
as dictated by the prison’s work schedule. It is 
probable that extremes of mental deficit and of recal- 
citrancy were eliminated in this process. However, 
since the primary contribution of this study comes 
from the correlation of individual differences within 
the tested groups, this uncertainty of representative- 
ness, while undesirable, is judged to be no more dis- 
qualifying than is the similar casualness of the usual 
college or clinic sample. The Ss were all tested at 
one time in a single classroom, with a minimum of 
supervision and a maximum of privacy. Most papers 
were completed by the end of one hour. All papers 
were anonymous. The final sheet of the question 
naire asked for a classification of the crime resulting 
in imprisonment. The 79 Ss included 22 in for mur- 
der, 21 nonviolent theft, 16 “intellectual” crimes of 
fraud, forgery, and embezzlement, 10 sex offenders, 
and 10 violent crimes such as armed robbery, assault 
and battery. The overall distribution of the crimes 
reported coincided well with the overall distribution 
of the group tested according to the prison records, 
and this fact, plus the apparent sincerity of the 
answers, lends credibility to the results obtained 


were 


RESULTS AND DISCUSSION 
The Mutual Validation of the Tests 


The multitrait-multimethod validation ma- 
trix obtained is presented in the top four rows 
of Table 1. Judging this outcome not in 
terms of some a priori ideal, but rather in 
comparison with the experience of such ma- 
trices in previous research (Campbell & Fiske, 
1959), this represents an exceptionally good 
validity picture. In the first place, the validity 
values of .51 and .50 are quite high. (Of the 
129 validity values actually tabled in the 
Campbell and Fiske paper, 88% are lower.) 
Second, the validity values are conspicuously 
higher than the heterotrait-heteromethod val- 
ues of .12 and .13, showing good discriminant 
validity in a way often found lacking: the 
correlation between the two types of test is 
quite specific to the concomitant intended 
attitudes or trait content. Third, and most 
surprisingly, method variance seems a very 
small component of total test variance, as 
indicated by heterotrait-monomethod values 
of .05 and —.04. Method variance would be 
indicated by heterotrait-monomethod values 
larger than their corresponding heterotrait- 
heteromethod values. Usually, as a matter of 
fact, the heterotrait-monomethod values are 


TABLE 1 
THE MULTITRAIT-MULTIMETHOD VALIDITY MATRIX 


Structured Projective 


H&PL&J H&P L&J 


Structured, H & P 

Structured, L & J 

Projective, H & P 

Projective, L & J 

Mean Verbal 
Output 


*p < 0.005. 


larger even than the validity values, two traits 
measured by the same method correlating 
higher than the same trait measured by two 
different measures. 

Perhaps this absence of method-specific co- 
variance is due to the unambiguous attitude 
topics employed. For example, Gage et al. 
(1957) found acquiescence response set bias 
only on the more difficult items, and Chap- 
man and Campbell (1959, in press) found 
much more acquiescence bias on the relatively 
amorphous F Scale than on the more concrete 
Ethnocentrism and Manifest Anxiety scales. 
Perhaps the deliberate planning of the tests 
in this study to avoid such chronic sources of 
irrelevant high correlations has paid off—al- 
though even under the best of conditions, we 
would usually expect two different traits to 
correlate higher when both are measured by 
the same method (heterotrait-monomethod 
values) than when each is measured by a 
different method (heterotrait-heteromethod 
values). 

The sources of the strong method factors 
that jeopardize validity in a given type of 
test often can be specified. Thus acquiescence 
response set has been found to be a strong 
source of ‘irrelevant’ correlations among atti- 
tude tests not specifically designed to control 
it. Thus the general response feature of num- 
ber of responses seems to account for the high 
heterotrait-monomethod correlations typical 
of Rorschach studies. MacIntosh and Maher 
have recently pointed out (1958) that mean 
length of response to each item can be such 
a factor in the sentence completion test where 
the scoring is such that, for example, only 
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TABLE 2 


CORRELATION OF TEST SCORES WITH CRIMES 


Structured 
H&P 


Murder 08 
Nonviolent Theft — .22* 
Intellectual —.13 
Sex Offender 11 
Violent a 


*p < 0.10, 
*§ > < 0.05. 
*#* D < 0.01. 


symptoms of maladjustment are noted. With 
such a key, the more a man writes or talks on 
a given item, the more chance he has of pro- 
ducing a neurotic sign. The scoring system of 
this study is, in contrast, bipolar. The longer 
the response, the more chance of both favor- 
able and unfavorable clauses. But even so, it 
was felt desirable to explore directly the cor- 
relate of this trait-irrelevant response at- 
tribute. For this reason the mean verbal out- 
put per item was computed for all 40 sentence 
completion items. This value was then cor- 
related with all four attitude with 
results as shown in the bottom row of Table 
1. The values are all nonsignificant, and com- 
pare strikingly favorably with values such as 
.78 between verbal output and adjustment 
scores found for the Rotter Incomplete Sen- 
tences Blank by MacIntosh and Maher 
(1958) and .66 between verbal output and 
Y score on the Rorschach reported by Lotsof 
(1953). 

All in all, the validity picture is probably 
as good as any so far reported in psychologi- 
cal studies of individual differences. Perhaps 
when tests are designed with direct attention 
to the sources of invalidity made explicit by 
the multitrait-multimethod matrix, we shall 
find a level of validity far better than the 
depressing sample of studies assembled by 
Campbell and Fiske. 


scores, 


The Comparative Validity of the Two Types 
of Test 


In the above analysis the two types of test 
have been mutually validating, rather than 
showing one method to be more valid than 
the other. The validity coefficients of .51 and 


Mean 
Verbal 
Output 


Pre jee tive 


H&P L& J 


11 — .07 .06 
26 12 16 
13 a 
aa 15 — 09 


15 30 21 


.50 reflect equally on both methods. Only in 
the magnitude of the irrelevant method- 
variance, in the heterotrait-monomethod r’s, 
could the two methods have shown differen- 
tial validity—and here no differences ap- 
peared. What is needed to compare the rela- 
tive validity of the two methods is a third 
method against which they could each be 
checked. Ratings by guards or cellmates on 
the two attitudes would have done well for 
this purpose, for example. However, these 
were not available. Potentially, behavioral 
measures such as type of crime committed 
could serve this role of the third method type, 
particularly where theory or research experi- 
ence tend to differentially link certain crime 
types with certain attitudes, so that confirma- 
tions of these could be regarded as “validity” 
values. While such a background is as yet 
lacking, an approximation to the approach 
can be stated. Insofar as the attitude scales 
differentiate crime groups, which method pro- 
vides the sharper differentiations, or the higher 
“correlation” with crime type? In particular, 
where both methods show a significant cor- 
relation with a given crime, which one shows 
the higher? 

Table 2 presents the results of such an 
exploration. Since not enough strong differ- 
entiation of crime groups emerged to be de- 
cisive, this must stand as a demonstration of 
a possible method rather than as a conclusive 
study of relative merit. Each entry in Table 
2 reports a “correlation” between an attitude 
score and a crime tendency. Since the attitude 
scales gave the larger numbers to the more 
favorable answers, a positive value indicates 
that the crime group in question is more 
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favorable than the other prisoners on this 
topic. Thus the .35 value indicates that the 
intellectual criminals have more positive atti- 
tudes toward law and justice on the direct 
structured items than do the other criminals. 

The correlation coefficient chosen for this 
purpose was the biserial r. In spite of several 
assumptions which are not appropriate to this 
use, it seemed the best descriptive technique, 
in that it offers a statement of relationship 
independent of the sample size and the di- 
chotomization marginals. The biserial rs rep- 
resent the relation between the attitude meas- 
ure and a split of the sample into one crime 
group versus all others. Each of these rela- 
tionships has likewise been computed as a ¢ 
ratio, and the statements of the significance 
of the values in Table 2 are based upon these 
t ratios rather than upon the biserial rs di- 
rectly. (These ¢ ratios are not presented here 
essentially redundant.) Similarly, 
differences between crime groups taken two 
at a time have also been computed, and are 
not reported as no different information 
emerged. 

What we would have liked to have found in 
Table 2 would be a close paralleling of 
‘strongly significant correlations between the 
two measurement procedures, with, however, 
one method being consistently somewhat su 
perior. To find the significant .35 value be- 
tween intellectual crime and the structured 
attitude toward law and justice paralleled by 
a trivial .07 when the projective method is 
used is disheartening, and makes one hesitate 
in interpreting the .35 value, statistically sig- 
nificant though it is. While there are no 
parallel pairs of significant values, even when 
the optimistic .10 level is used, there is none 
the less some degree of reassuring parallelism, 
and the .35 versus .07 instance cited is the 
worst disagreement in the whole table. Com- 
paring the values for the five crime groups 
for H & P structured with H & P projective, 
the rank order correlation is .90. For L & J 
the value is .60. A contrast against which to 
examine this agreement can be provided by 
again raising the issue of method commu- 
nality—to what extent do the two measures 
employing the same method order the groups 
in the same way? Pairing the values in this 
fashion provides rank order correlation of 


as being 


—.30 for the structured, and .00 for the pro- 
jective. It would seem then that insofar as 
there is some overall agreement among the 
tests in the ordering of the crime groups, this 
agreement is a function of content and not 
of test format. 

Tentatively assuming some degree of dif- 
ferentiation among crimes by the tests, we 
can now proceed to the question of what 
method produces the greatest differentiation. 
The structured tests provide more high ¢ 
ratios. If any difference exists, the old fash- 
ioned, superficial, and easily scored procedure 
is superior rather than inferior to the more 
modern projective technique. Of course the 
data are too few and too close, and the setting 
too limited, to establish any superiority either 
way. The few significant values obtained in 
the crime group comparisons might well be 
chance sampling fluctuations. From the 40 
tests, 4 significant at the 10% level and 2 at 
the 5% level would be the modal chance 
expectation, and this is just what we find. 
The projective format employed is not rep- 
resentative of the whole range of projective 
instruments, being classified in one typology 
(Campbell, 1957a) as voluntary, direct, and 
free response, whereas the modal projectives 
are voluntary-disguised-free response tests 
such as Rorschach and the TAT. This study 
is, however, an illustration of the type of 
transition study that might well have accom- 
panied the replacement of the questionnaire 
approaches typical of personality research of 
the 1920s and 1930s by the more sophisti- 
cated projective techniques of the 1940s and 
1950s. 


SUMMARY 


Attitude toward Home & Parents and Atti- 
tude toward Law & Justice of 79 prison in- 
mates were each measured by a sentence 
completion test and a structured attitude test. 
through a _ multitrait-multi- 

these tests were found to 
validate each other quite satisfactorily. Inso- 
far as the two measurement approaches dif- 
fered at all in the efficacy with which they 


As examined 
method matrix, 


differentiated crime groups among the prison- 


ers, the structured tests were slightly the 
better, 
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The term transitivity is used to identify 
the set of relationships A > B, B > C, then 
A >C. When applied to judgment data we 
find that these relationships are influenced by 
such factors as the dimensionality of the 
stimulus series being judged, the distance be- 
tween stimuli, and the care with which the 
S approaches the judgment task. Frequent 
violations of transitivity often create serious 
interpretation difficulties for the investigator. 
This is true especially in applied situations 
in which attitude statements or other comp!ex 
stimuli are scaled. Here, the investi- 
gator frequently does not have information 
available to him which would permit him to 
disentangle the stimulus and subject variables 
which possibly contribute to the transitivity 
of the judgments. Many of the most fre- 
quently used psychological scaling techniques 
do not permit a test of transitivity since they 
force this property on the data. For instance, 
with ranking data when A > B, and B>C, 
then A>C is implied and not empirically 
determined. However, the method of paired 
comparisons does permit a test of these judg- 
ment relationships since every possible com- 
parison of paired stimuli is made by the S. 

Hill (1953) reports an investigation in 
which judges were instructed to pair-compare 
groups of attitude statements which varied 
according to the scale distance intervening 
between these stimuli. Using Kendall’s (1948) 
coefficient of consistence, it was found that 
there were fewer violations of transitivity ob- 
served among those statements separated by 
the greatest scale distance. Furthermore, it 
was reported that when the judged stimulus 
groups were equated for scale separation, Ss 
who demonstrated a high number of incon- 
sistent judgments with one group of stimuli 
tended to demonstrate a high number of in- 
consistent judgments on the second group. 


social 


1This study is a portion of a project supported 
by the Oklahoma State University Research Foun- 
dation. 


There has been little research reported fol- 
lowing Hill's study which concerns the factors 
that influence the transitivity characteristics 
of judgments. One variable which conceivably 
relates to the transitivity of judgments is the 
wording of the instructional material pre- 
sented to the S. In the method of paired com- 
parisons one may consider the instructions as 
serving as a background variable against 
which comparative judgments are expressed. 
Hence, in situations in which judgments de- 
note preferences, the instructions generally 
define the intensity of preference indicated by 
each judgment. By modifying the preference 
intensity expressed by each judgment it is 
conceivable that the dimensionality of the 
stimulus series, the distance between stimuli, 
or the Ss’ general approach to the judgment 
task is significantly modified. The purpose of 
this study is to determine the influence of 
variations in instruction wording upon the 
transitivity of paired comparison judgments. 
Instructions will be considered a background 
variable against which judgments are made, 
and will be along a distance 
dimension. 


varied social 


MeETHOD 


Thirty white undergraduates served as Ss for this 
experiment. Thirteen of the Ss were mal 
17 were female, and all were below the age of 25 
Ss were randomly assigned to three groups of 10, 
and each was presented with 91 paired combinations 
of 14 nationality group names 


students, 


Each group was in- 
structed to select one of each pair that was felt to 
be most preferred by the college student 
However, preference intensity was defined in 
of social distance, and for each group a different 
social distance statement was used in the instruc- 
tions. For Group A, preference was defined in terms 
of desirability of these nationality groups as 
mates. Group B responded in terms of the 
bility of these nationality members of 
social and fraternal groups, and Group C with regard 
to desirability as dormitory roommates. Data were 
collected in groups with all groups receiving identi 
cal treatment with the exception of differences in 
instructions. 


average 
terms 


class- 
desira 
groups as 
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TABLE 1 
MEANS AND RANGES OBTAINED FROM 
CrrcuLAR TRIAD Data 
Range 


Group Mean 


10.7 24 
12.4 21 
9.4 24 


RESULTS AND DISCUSSION 


The judgments were analyzed in order to 
determine the number of circular triads found 
in each individual judgment matrix. The term 
circular triad is used by Kendall (1948) to 
refer to a violation of transitivity in judgment 
data. Table 1 summarizes the results of this 
analysis; here the mean number of inconsist- 
ent judgments and the range are presented for 
each group. A single classification analysis of 
variance performed on these data yielded a 
nonsignificant F at the .05 level of signifi- 
cance, thus indicating that the mean number 
of circular triads did not vary significantly 
among the three instruction conditions. Fur- 
thermore, when each individual judgment ma- 
trix was analyzed using a x” test developed by 
Kendall it was found that none of these ma- 
trices contained as many violations of transi- 
tivity as would be necessary to reject the 
hypothesis that chance factors were alone pro- 
ducing these inconsistent judgments. Refer- 
ence to Table 1 will indicate that 24 circular 
triads were the maximum observed for any 
one S, and with 14 stimuli the 
number possible is 112. 

Torgerson (1958) points out that transi- 
tivity is a pertinent test of the unidimension- 
ality of judgments. Hence, in the light of 
the preceding analysis it would appear that 
the judgment data present some support 
for the satisfaction of the transitivity require- 
ments of a unidimensional scale. Furthermore, 
it appears that modification in the intensity 
of preference ascribed to the judgments by 
the instructions does not significantly alter 
this judgment characteristic which enters so 
importantly in the interpretation of 
data. 

Since the circular triad is derived from the 
internal characteristics of a judgment matrix, 


maximum 


scale 


Smith 


the above results do not necessarily imply 
that the scales obtained from the three groups 
will interrelate highly. It is quite possible to 
have groups of Ss who are all fairly consist- 
ent in making judgments and still have scales 
which do not correlate strongly. 

Scale values were computed for each group. 
The average scale intercorrelation was .62, 
with a range of coefficients from .58 to .63. 
This finding is not completely in agreement 
with the results of an unpublished study by 
Eggan, cited by Gulliksen (1946). This study 
reports rho values ranging from .97 to .99 
for five sets of paired comparison scale values 
obtained under five instruction wording con 
ditions which varied in terms of social dis- 
tance. Gulliksen cites these results not as 
being crucial to the support of paired com- 
parisons as a measurement procedure, but as 
a demonstration of the generality and utility 
of paired comparison scale data. The results 
of this present study indicate that rather 
serious alternations in scale values accom- 
panied modifications in instruction wording. 


SUMMARY 


Three groups of 10 undergraduate Ss 
paired compared 14 nationality group names, 
each under one of three instruction wording 
conditions. Instructions directed Ss to select 
the preferred member of each pair, and pref- 
erence was defined in terms of three social 
distance statements. The judgment data were 
analyzed in order to determine the number 
of violations of transitivity observed under 
the three instruction conditions. The results 
indicated that there were no significant dif- 
ferences noted among the three groups. Scale 
values obtained from the three judgment con- 
ditions yielded an average intercorrelation 
of .62. 
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THE RELATION OF THE 
INVENTORY TO THE 


The first report about the Holland Voca- 
tional Preference Inventory, an experimental 
personality inventory composed of occupa- 
tional titles, indicated that it differentiated 
significantly a variety of criterion groups, in- 
cluding psychiatric patients, TB patients, 
psychopaths, controls, and a number of col- 
lege student samples (Holland, 1958). In 
order to obtain additional information about 
scale validities, the Vocational Preference In- 
ventory (VPI), a short form of the HVPI, 
was intercorrelated with Cattell’s Sixteen Per- 
sonality Factor Questionnaire (16 PF) in the 
present study. The 16 PF is a desirable in- 
strument for this purpose since a number of 
its scales measure variables which appear to 
be similar to scales in the HVPI. It 
provides a means of testing many of the 
hypothesized correlates of individual HVPI 
scales and of confirming the assumption about 
response set, or acquiescence, outlined in the 
original report (Holland, 1958). 


also 


THE STUDENT SAMPLE 


As part of a three-inventory assessment 
battery, the VPI and the 16 PF were admin 
istered by mail to a sample of high school 
seniors: 783 boys, 394 girls. They are 83% 
of a one-sixth random sample drawn from a 
group of 7500 students, the survivors of a 
national scholarship competition in which 
255,942 high school seniors participated. On 
the average, their Scholastic Aptitude Test 
(SAT) scores and High School Rank (HSR) 
are about 2 standard deviations above those 
of the average high school graduate. The 
educational and income levels of their parents 
are also well above the national average. 

Students who omitted more than 10 items 
on the 16 PF or more than 50 items on the 


1 This study was partially supported by the Na 
tional Science Foundation and the Old Dominion 
Foundation. I am indebted to Donald L. Thistle- 
thwaite and Laura Kent for their constructive re- 
views of this paper. 
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VPI were excluded from the sample. The 
application of these criteria resulted in a loss 
of 17 students. 

THE 16 PF AND THE VPI 

Form A of Cattell’s Sixteen Personality 
Factor test was used. This inventory has been 
described in a number of publications (Cat- 
tell, 1946, 1950, 1957; Cattell, Saunders, & 
Stice, 1957). 

The Vocational Preference Inventory used 
in this study is a revised edition of the HVPI 
(Holland, 1958), which consists of 263 scored 
items (occupational titles) in a pool of 300 
items. The VPI was developed by selecting 
240 of the 263 scored items in the HVPI. 
Generally, items were selected for their ability 
to differentiate between high and low scorers 
on the HVPI. The HVPI contains some over- 
but in the revision, items 


lapping items; 
formerly scored for two or more scales were 
scored only for the scale in which they had 
their highest discrimination index. Since the 
HVPI scales have a high degree of internal 
consistency (split-half correlations range from 


.72 to .95, with a median of .85), it is as- 
sumed that the revised scales in the VPI also 
possess adequate internal consistency. Instead 
of developing a separate inventory for each 
sex, a single form of the VPI was developed 
from the male form of the HVPI; therefore, 
the VPI is probably more effective in assess- 
ing boys than girls. The VPI contains 12 
scales in all: 2 response set scales and 10 per- 
sonality scales. The response set scales are: 
(1) Acquiescence and (2) Infrequency. The 
remaining scales are: (1) Physical Activity, 
(2) Intellectuality, (3) Responsibility, (4) 
Conformity, (5) Verbal Activity, (6) Emo- 
tionality, (7) Control, (8) Aggressiveness, 
(9) Masculinity-Femininity, and (10) Status. 
RESULTS 

The VPI and 16 PF were intercorrelated, 
using the Davidoff and Goheen method for 
estimating tetrachoric correlations (Edwards, 
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Sixteen Personality 


1954). The resulting matrix of correlations 
for the male and female samples is presented 
in Table 1. Since 47 and 28% of the correla- 
tions for boys and girls respectively are sig- 
nificant beyond the 5% level, the results 
cannot be attributed to chance. The larger 
percentage of significant results for boys is 
probably due to the fact that all the inven- 
tory scales, with the exception of the Mf 
scale, were developed from male samples; 
also, boys’ occupational choices may have a 
different organization than those of girls. 

When the significant correlations shown in 
Table 1 are abstracted for each of the VPI 
scales, these correlations generally confirm 
our earlier description of the domains meas- 
ured by the HVPI scales (Holland, 1958, 
1959a). In the following paragraphs, the find- 
ings for euch VPI scale are summarized; the 
16 PF variables which have significant cor- 
relations with that scale are arranged under 
it in order of their absolute correlations. How- 
ever, the intercorrelations of Cattell’s B factor 
(intelligence) and the VPI scales are not 
shown. In general, these correlations indicate 
that the VPI scales have only negligible cor- 
relations with intelligence (see Table 1). 

To facilitate interpretation, we have used 
Cattell’s alternate titles and adjectival de- 
scriptions for his scales rather than his tech- 
nical terms. These simple scale synonyms are 
intended as an aid to the reader who is not 
familiar with the Personality Factors; the 
interested reader should see Cattell’s manual 
(Cattell et al., 1957) for a more complete 
understanding of the variables. In addition, 
the meaning of negative correlations has been 
clarified by using adjectives which describe 
the negative poles of the Personality Factors. 
The adjectives and scale names for the Per- 
sonality Factors, then, indicate the character- 
istics which are associated with the positive 
end of the VPI scale. 

\CQUIESCENCE SCALE (RESPONSE SET 
Boys Girls 
Dominant (.28 Effeminate (.21 
Cheerful (.25 Sociable (.20 
Adventurous (.23 
Sociable (.13) 


Cheerful (.15 


Group Dependent 


Factor Questionnaire 
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The 16 PF correlations with the Acquies- 
cence scale generally support the scale de- 
scription (Holland, 1958): “Over-responsive- 
ness may be reflected in dependence, 
aggression, euphoria, over-intraception of the 
culture, impulsivity, sociability, frankness.” 
Only the Acquiescence and I factor correla- 
tion for girls is a clear exception to the 
original scale description. 


INFREQUENCY SCALE 


Girls 


Boy Ss 


Effeminate (.27 I Effeminate (.20 
Depressive (—.27 M 


Q Self-Sufficient (.16 L 


Introverted (.18 
Suspecting (.16 
Immature 15 

Introverted 15 

Unpretentious 14 


Sociable 13 


Since the Infrequency scale was modelled 
after the MMPI F scale, it is meaningful that 
the Infrequency scale discriminates between 
controls and psychiatric patients (Holland, 
1958) and that high scorers are characterized 
by the 16 PF as effeminate, depressive, im- 
mature, introverted, and suspecting. The cor- 
relation with the A factor, Sociable, is the 
only negative evidence. The Q, factor, which 
is an introversion variable with 
nonconformity and with “seclusiveness” in 
children, seems to support an earlier assump- 
tion that the Infrequency scale is a measure 
of cultural conformity (Holland, 1959a). 

The correlates of the Acquiescence and In- 
frequency scales do not, of course, indicate 
the validity of these scales for detecting fak- 
ing or defensiveness. The 16 PF correlations 
suggest, however, that these response set or 
validity scales are associated with a number 
of the hypothesized attributes. 


associated 


Puysicat Activity SCALE 


Boys Girls 


Realistic, Aloof, stiff 
masculine (— .47 
Practical (—.39 
\loof, stiff (—.32) 
Mature (.32) 


Sophisticated (.25 


Realistic, 
masculine 
Depressive 
Shy (—.17 
Persistent (.16 


(—.15) 


Submissive 
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This cluster is generally consistent with the 
clinical interpretation and conceptual defini- 


tion given in the HVPI manual (Holland, 
1959a): 


. masculine 
think of everyday problems in practical, 
concrete terms . . . prefer dealing with things rather 
than people. Enjoy repetitive and well defined 
work duties. 


High scorers regard themselves as . 
persons... 


For boys, the correlations for the C and N 
factors were not anticipated by the scale defi- 
nitions. For girls, findings for the F and E 
factors may be due to unanticipated sex dif- 
ferences on the Physical Activity scale. 


INTELLECTUALITY SCALE 


Boys Girls 


Aloof, stiff ( 33) 
Self-sufficient (.22) 
Radical (.20 


Realistic, 


Aloof, stiff (—.40) 
Realistic, 

masculine (—.28 
Radical (.19) 
masculine (—.17 Shy (—.18) 
Controlled (.12) Accepting (—.17 

These correlations support many of the hy- 
pothesized interpretations given in the HVPI 
manual: ‘Have limited number of social rela- 
tionships. Are persistent, have introcepted less 
of culture and . . . possess atypical values.” 
Though the findings for the I and L factors, 
for girls, were not explicitly anticipated, they 
seem meaningful in relation to the other vari- 
ables in this cluster. 


RESPONSIBILITY SCALI 


Boys Girls 


Sociable (.50) Sociable (.49) 

Sensitive, effeminate 
(.43) 

Cheerful (.32 

Group Dependent 


Sensitive, effeminate 
(.40 
Unpretentious 
(—.30) 
Group Dependent (—.30) 
(—.30) Adventurous (.26 
Adventurous (.26) Conservative (—.22 
Cheerful (.18) N 
Conservative (—.14 .20 
Dominant (.17 


Unpretentious 


Dominant (.11) 
Persistent (.11) 


These findings indicate that the Responsi- 
bility scale is a measure of oral receptive- 
ness, “that is, femininity, passivity, problem- 


. Holland 


solving by means of feeling rather than 
thinking, and dependency” (Holland, 1959a). 
The super-ego qualities implied by the scale 
designation, Responsibility, receive only bor- 
derline confirmation from the G factor. “So- 
ciability” or “oral dependency” might be 
more appropriate names for this scale. 


CONFORMITY SCALE 


Boys Girls 
Realistic, masculine M 
(—.28) G 
Sophisticated (.20) Q; 
(.18) Q: Group Dependent 
.17) (—.15) 
Group Dependent 
(—.14) 
Excitable (.11) 


Practical (—.21) 
7 


Persistent (. 

Conservative .16) 
Persistent 
Practical ( 


All the correlates for girls appear consistent 
with “conformity” and its connotations. For 
boys, three of the correlates are meaningful 
(G, M, and Q.) and three (I, N, and Q,) 
were not anticipated by the scale rationale. 

VERBAL Activity SCALE 
Boys Girls 
Sor iable (.37) 
Cheerful (.37) 


Dominant (.34) 


(54) 
Cheerful (.40) 
Adventurous (.39) 


Sociable 


Adventurous (.32 Group Dependent 


(—.27) 


Group Dependent 
(—.29) 

Conservative (—.18 

Uncontrolled (—.11 


Dominant (.24 


This scale has been conceptualized as a 
measure of “oral aggressiveness” (Holland, 
1959a), an interpretation which appears con- 
sistent with the Personality Factors listed 
above. 


EMOTIONALITY SCALE 


Boys Girls 


Sensitive, effeminate 
(46 


Immature ( 


Sensitive, effeminate 
(24) 

Introverted (.24 

Sociable (.19 


.29) M 
Introverted (.22 \ 
Adventurous (.20 
Dominant (.16 

Insecure (.16 
Unpretentious (—.15 
Sociable (.14 
Undependable (—.11) 
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The Emotionality scale is assumed to be 
a measure of “instability and anxiety” (Hol- 
land, 1959a). Except for the H and A factors 
(for boys and girls respectively), the Person- 
ality Factors above are consistent with these 
conceptions; however, some of the correlates 
of H reported by Cattell et al. (1957) appear 
congruent with the Emotionality scale. 

AGGRESSIVENESS SCALI 
Boys Girls 
Sociable (.37 \ Sociable (.37 
Cheerful (.33 E 
Adventurous (.33 F 


Dominant (.28 
Cheerful (.26 
Dominant (.32 H Adventurous (.19) 
Group Dependent 

(—.22 


Immature 15 
Uncontrolled (—.15 


The findings for the Aggressiveness scale 
correspond so closely to those obtained for 
the Verbal Activity scale that it appears de- 
sirable to delete one of these two scales from 
future editions. This finding is supported by 
the correlation of .72 between the VA 

g scales for boys (Holland, 1958). 


and 


CONTROL SCALE 
Boys Girls 
Sensitive, effeminate I Sensitive, effeminate 
(.28) (.26 
Depressive (—.26 
Introverted (.20) 
Persistent (.18) 
Sociable (.17) 
Unpretentious (— .15) 
Submissive (—.14) 
Immature (—.11) 


This scale was assumed to measure control 
and qualities associated with overcontrol of 
impulses, “namely, hypochondriasis, phobias, 
fear of physically dangerous activities, repres- 
sion, denial, and to a lesser extent passivity” 
(Holland, 1959a). The present evidence, ex- 
cept perhaps for the correlations between the 
Control scale and the A, N, and C factors, 
appears to support this description of the 
overcontrolled person. 


Factor Questionnaire 


A 
F 
G 
QO; 


MASCULINITY-FEMININITY SCALE 


Boys 


Reaiistic, masculine 
(—.55) 
Practical (— .30) 
Mature (.26) 
Sophisticated (.24) 
Confident (—.20) 
Group Dependent 
(—.19) 
Aloof, stiff (—.18 
Cheerful (.15) 
Persistent (.15) 


Controlled (.11) 


Girls 
Realistic, masculine 
(—.40 
Aloof, stiff (—.23) 
Mature (.19) 
Sophisticated (.18) 
Self-sufficient (.15) 


The I scale of the 16 Personality Factor 
test, which is a masculinity-femininity scale, 


has its highest correlation with the Mf scale 
of the VPI. The correlation is negative, since 
these scales are scored in opposite directions. 
The other factors form generally a cluster of 
traits which are usually attributed to mascu- 
linity. 

STATUS SCALI 


Boys Girls 


Sociable (.46) Sociable (.41 


Sensitive, effeminate 
(.33) 

Introverted (.33 

Immature (—.28 

Group Dependent 
(—.24) 

Unpretentious 


Cheerful (.36 
Dominant (.22 
Adventurous (.22 
Sensitive, effeminate 
(.20 
Group Dependent 
18 


(—.20 
Dominant (.19 
Adventurous (.19) 
Cheerful (.16 
Insecure (.11) 


With the exception of the N and O factors 
for boys, the factors in this cluster appear 
congruent with our knowledge of occupational 
status. The correlations for the F factor seem 
to support the earlier hypothesis (Holland, 
1959a) that high status scores are associated 
with positive self-evaluations and low status 
scores with negative ones. 


DISCUSSION 
Taken together, the results provide support 
for the rationale underlying the development 
of the HVPI and for the construct validity of 
its individual scales. The need for more com- 
plete and precise definitions of a number of 
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HVPI scales is evident when we attempt to 
relate their scale descriptions to the 16 PF 
scales. The inadequacy of some of the defini- 
tions results in a few correlational findings 
which cannot be interpreted as either positive 
or negative evidence for the construct validity 
of a given scale. The present evidence is use- 
ful, however, for clarifying the scale mean- 
ings. 

Since the VPI is assumed “to yield a broad 
range of information concerning the S’s per- 
sonal adjustment,” it is pertinent that its 
scales generally correlate more frequently, 
and have higher relationships, with the first 
eight factors than they do with the last eight 
factors in the 16 PF. The 16 PF scales are 
arranged in order of the degree to which they 
are assumed to account for variance in the 
personality domain (Cattell, 1950); accord- 
ingly, it is clear from Table 1 that the VPI 
scales are related more closely to the most 
comprehensive scales, or the first eight 
factors. 

The correlational results obtained in this 
report appear to support those obtained in 
a comparison of criterion groups (Holland, 
1958). In that report, psychiatric patients, in 
contrast with matched controls, obtained sig- 
nificantly higher scores on the Infrequency 
scale, and lower scores on the Acquiescence, 
Physical Activity, Conformity, Verbal Activ- 
ity, Aggressiveness, and Mf scales. If the 
individual correlational clusters in the present 
study are examined with regard to their im- 
plications for personal adjustment, the mean 
differences obtained earlier appear to be 
meaningful and consistent. These results also 
confirm some of the findings of other investi- 
gators who have reported that high mechani- 
cal interest scores (Physical Activity scale) 
are associated with normality and that high 
artistic, musical, and literary interest scores 
(Emotionality scale) are associated with pa- 
thology. An excellent summary of this evi- 
dence has been provided by Patterson (1958). 

The present evidence appears to replicate 
many of the relationships found between per- 
sonality variables and occupational classes 
obtained in a classification for occupations 
in terms of personality and intelligence, which 
was developed from a review of 15 corre- 
lational matrices containing both interest 
and personality inventories (Holland, 1959b). 


John L. Holland 


This evidence was organized so that the 
personality variables associated with a given 
occupational class are ordered for each of six 
classes by using selected HVPI keys (Physi- 
cal Activity, Intellectuality, Responsibility, 
Conformity, Verbal Activity, Emotionality) 
as Criteria for class membership. The 16 Per- 
sonality Factors are usually clustered on the 
same HVPI scales in both this study and the 
classification study. 

The results obtained here are of value aside 
from the present validation study which is 
their raison d’étre. Table 1 indicates, perhaps, 
more effectively than any previous analysis, 
that preferences for occupational titles are 
significantly related to a number of person- 
ality variables. By implication, these relation- 
ships suggest that vocational choice is, in 
part, a function of personality. 


SUMMARY 


The intercorrelations of the VPI and the 
16 PF generally provide positive evidence for 
the construct validity of the VPI and its 
rationale. Forty-seven percent and 28% of 
the correlations between these instruments 
are significant for large samples of high school 


senior boys and girls, respectively; moreover, 
the 16 Personality Factors usually support 


VPI 


the scale definitions of the 
assumed correlates. 


and their 
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